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Abstract 

Image data are often composed of two or more geometrically distinct constituents; 
in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy su- 
perclusters) and curvelike structures (filaments). It would be ideal to process a single 
image and extract two geometrically 'pure' images, each one containing features from 
only one of the two geometric constituents. This seems to be a seriously underdeter- 
mined problem, but recent empirical work achieved highly persuasive separations. 

We present a theoretical analysis showing that accurate geometric separation of point 
and curve singularities can be achieved by minimizing the £i norm of the representing 
coefficients in two geometrically complementary frames: wavelets and curvclets. Driving 
our analysis is a specific property of the ideal (but unachievable) representation where 
each content type is expanded in the frame best adapted to it. This ideal representation 
has the property that important coefficients are clustered geometrically in phase space, 
and that at fine scales, there is very little coherence between a cluster of elements in 
one frame expansion and individual elements in the complementary frame. We formally 
introduce notions of cluster coherence and clustered sparsity and use this machinery 
to show that the underdetermined systems of linear equations can be stably solved 
by ^1 minimization; microlocal phase space helps organize the calculations that cluster 
coherence requires. 

Key Words, minimization. Sparse Representation. Mutual Coherence. Cluster 
Coherence. Tight Frames. Curvelets, Shearlets, Radial Wavelets. 
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1 Introduction 



Cosmological data analysts face tasks of geometric separation [39, 40]. Gravitation, acting 
over time, drives an initially quasi-uniform distribution of matter in 3D to concentrate near 
lower-dimensional structures: points, filaments, and sheets. It would be desirable to process 
single 'maps' of matter density and somehow extract three 'pure' maps containing just the 
points, just the filaments, and just the sheets around which matter is concentrating. 

In seemingly unrelated fields, such as medical imaging and materials science, related 
questions arise frequently and naturally. For example, a technologist with a single confused 
image of an aggregate might wish to create two images, one containing just the fibrous and 
the other just the granular structures, respectively. 

Such 'desires', when voiced by a working scientist or engineer, really amount to a request 
for existing information technology to be put to work here and now on data available today. 
No doubt there is a wide spectrum of image processing 'hacks' and 'improvisations' that 
might be useful, on a case-by-case basis. The mathematician's interest would only be piqued 
when an intellectually coherent approach shows promise of success, especially if the reasons 
for success are subtle and instructive. 

Recently, astronomer Jean-Luc Starck and collaborators have been empirically successful 
in numerical experiments with component separation; their approach used tools from mod- 
ern harmonic analysis in a provocative way. They used two or more overcomplete frames, 
each one specially adapted to particular geometric structures, and were able to obtain sep- 
aration despite the fact that the underlying system of equations is highly underdetermined. 
Here we analyze such approaches in a mathematical framework where we can show that 
success stems from an interplay between geometric properties of the objects to be separated, 
and the harmonic analysis for singularities of various geometric types. We eventually point 
to a much wider range of seemingly very different 'imaging' problems where our analysis 
techniques can provide insight. 

1.1 Singularities and Sparsity 

As a mathematical idealization of 'image', consider a Schwartz distribution / with domain 
R^. The distribution / will be given singularities with specified geometry: points and 
curves. 

We plan to represent such an 'image' using tools of harmonic analysis; in particular, 
bases and frames. While many such representations are conceivable, we are interested here 
just in those bases or frames which can sparsely represent / - i.e., can represent / using 
relatively few large coefficients. 

The type of basis which best sparsifies / depends on the geometry of its singularities. 
If the singularities occur at a finite number of (variable) points, then wavelets give what 
is, roughly speaking, an optimally sparse representation - one with the fewest significantly 
nonzero coefficients. If the singularities occur at a finite number of smooth curves, then one 
of the recently studied directional multiscale representations [curvelets or shearlets) will do 
the best job of sparsification. (For careful quantitative discussions of sparsification see, e.g., 
[6] etc.). 

In fact, real-world signals are, generally speaking, a mixture of content types and, cor- 
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respondingly, a model where singularities are of only one geometric type is overly narrow. 
If / is actually a nontrivial superposition V + C where V has only point singularities and C 
has only curvilinear singularities, then two things happen: 

• Neither wavelets alone nor curvelets alone will be very good for representing V +C. 
The sparsity either achieves alone is much less satisfactory than the ideal sparsity level 
- that which could be achieved by using wavelets for representing V and by curvelets 
in representing C. (This ideal representation is purely notional; it assumes one can 
first perfectly separate the two objects and then separately analyze the separated 
layers.) 

• In fact, no single basis or traditional linear representation is very good at sparsifying 
V + C compared to the ideal representation. 

This immediately suggests the need to use both systems to represent / sparsely; however, 
since each system is itself complete (or even overcomplete) there is no obvious traditional 
way to do this. 

In this paper, we consider the problem of developing sparse representations by combining 
both wavelets and curvelets and using a nonlinear representation based on ii minimization. 
The problem we solve is a continuum variant of a problem in image and signal process- 
ing with considerable practical interest, and extensive work for almost two decades. For 
references, see Subsection 1.6 below. That work, while suggestive and inspiring, concerns 
discretely indexed signal/image processing, obscuring the continuum elements of geometry 
and microlocal analysis which are essential to this paper. 



1.2 A Geometric Separation Problem 

Consider the following simple but clear model problem of geometric separation. Consider 
a 'pointlike' object V made of point singularities: 

p 

p = ^|x-Xi|-3/2. (1.1) 

i=l 

This object is smooth away from the P given points {xi : 1 < i < P). Consider as well a 
'curvelike' object C, a singularity along a closed curve r : [0, 1] i— )• R^: 

C = J 6r^t){-)dt, (1.2) 

where Sx is the usual Dirac delta function located at x. The singularities underlying these 
two distributions are geometrically quite different, but the exponent 3/2 is chosen so the 
energy distribution across scales is similar; if Ar denotes the annular region r < |^| < 2r, 

/ iVWO^r, [ |CT(Oxr, r^oo. (1.3) 

JAr J Ar 

This choice makes the components comparable as we go to finer scales; the ratio of energies 
is more or less independent of scale. Separation is challenging at every scale. 
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Now assume that we observe the 'Signal' 

f = V+C, (1.4) 
however, the component distributions V and C are unknown to us. 

Definition 1.1 The Geometric Separation Problem requires to recover V andC from know- 
ledge only of f; here V andC are unknown to us, but obey (1-1), (1-2) and certain regularity 
conditions on the curve r. 

As there are two unknowns {V and C) and only one observation (/), the problem seems 
improperly posed. We develop a principled, rational approach which provably solves the 
problem according to clearly stated standards. 

1.3 Two Geometric Frames 

We now focus on two overcomplete systems for representing the object /: 

• Radial Wavelets - a tight frame with perfectly isotropic generating elements. 

• Curvelets - a highly directional tight frame with increasingly anisotropic elements at 
fine scales. 

We pick these because, as is well known, point singularities are coherent in the wavelet 
frame and curvilinear singularities are coherent in the curvelet frame. In Section 8.1 we 
discuss other system pairs. For readers not familiar with frame theory, we refer to [11], 
where terms like 'tight frame' - a Parseval-like property - are carefully discussed. 

The point- and curvelike objects we defined in the previous subsection are real- valued 
distributions. Hence, for deriving sparse expansions of those, we will consider radial wavelets 
and curvelets consisting of real- valued functions. So only angles associated with radians 
9 G [0,vr) will be considered, which later on we will, as is customary, identify with P^, the 
real projective line. 

We now construct the two selected tight frames as follows. Let W{r) be an 'appropriate' 
window function, where in the following we assume that W belongs to C°°(R) and is 
compactly supported on [—2, — 1/2] U [1/2, 2] while being the Fourier transform of a wavelet. 
For instance, suitably scaled Lemarie-Meyer wavelets possess these properties. We define 
continuous radial wavelets at scale a > and spatial position b G by their Fourier 
transforms 

i^a,b{i) = <^-W{a\i\)-eM^'i}- 

The wavelet tight frame is then defined as a sampling of 6 on a series of regular lattices 
{ttjT?}, j > jo, where aj = 2^^, i.e., the radial wavelets at scale j and spatial position 
k = (ki, k2y are given by the Fourier transform 

MC) = 2-^ . Wm/2^) ■ exp{ik'C/2^}, 

where we let A = (j, k) index position and scale. 
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For the same window function W and a 'bump function' V, we define continuous 
curvelets at scale a > 0, orientation 6 G [0, vr), and spatial position h G by their Fourier 
transforms 

laAdii) = ■ Wia\^\)Via-'/\oj - 9)) ■ exp{i6'^}. 

See [7] for more details. The curvelet tight frame is then (essentially) defined as a sampling 
of 6 on a series of regular lattices 

{Re^.Da^Z^}, j>jo, £ = 0,...,2^^/^i -1, (1.5) 

where Rq is planar rotation by 9 radians, aj = , 9j^£ = 7r£/2-'/^, i = 0, . . . , 2-^/^ — 1, and 
Da is anisotropic dilation by diag{a, y/a)^ i.e., the curvelets at scale j, orientation and 
spatial position k = (ki, ^2) are given by the Fourier transform 

%iO = • Wm/2^)Vi{u - ^,,,)2^-/2) . eMKRo„,D^~^ky^}, 

where let rj = {j,k,i) index scale, orientation, and scale. (For a precise statement, see [8, 
Section 4.3, pp. 210-211]). 

Roughly speaking, the radial wavelets are 'radial bumps' with position k/2^ and scale 
2~^ , while the curvelets live on anisotropic regions of width 2~^ and length 2^^/"^. The 
wavelets are good at representing point singularities while the curvelets are good at repre- 
senting curvilinear singularities. 

Using the same window W, we can construct a family of filters Fj with transfer functions 

F,{o = wm/2^), eeR'. 

These filters allow us to decompose a function / into pieces fj with different scales, the 
piece fj at subband j arises from filtering / using Fj-. 

f3=Fj*f; 

the Fourier transform fj is supported in the annulus with inner radius 2^~^ and outer radius 
2^~^^. Because of our assumption on W, we can reconstruct the original function from these 
pieces using the formula 

f = Y,Fj*fj, /gL2(r2). 
j 

The tight frames of curvelets and radial wavelets discussed above interact in a very local 
way with the filtering Fj. 

Lemma. Let Tj denote the range of the operator of convolution with Fj. Then curvelets 
at level f are orthogonal to Fj unless \j' — j\ < 1. Similarly, radial wavelets at level j' are 
orthogonal to Fj unless \j' — j\ < 1. 

Proof. Indeed, Fj is the collection of all functions fj whose Fourier transform is rep- 
resentable as fj{^) = W{\^\/2^)f{^) where / G L^(R^). The support in frequency space of 
elements of Fj is thus an annulus Aj (say). The annuli have disjoint interiors if \ j — j'\ > 1. 
Hence Fj _L Fji if \j' — j\ > 1. 

However, both the radial wavelet frame elements and the curvelet frame elements at 
level / belong to Fji. □ 
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For future use, let Aj denote the collection of indices {j,k) of wavelets at level j, and 

j+n 
j'=j-n 

Similarly, let Aj denote the indices rj = (j, k,i) of curvelets at level j, and let 

j+n 
j'=j-n 

We conclude that elements of J-j can be represented using either only radial wavelets {tpx : 
A € A^^} or only curvelets {7,, : rj € ^f^}- 

1.4 Separation via ii Minimization 
1.4.1 Sparse Multiple Frame Expansions 

We now have two complete representations for J^j, yielding two ways of representing the 
subband component fj: in terms of its wavelet expansion: 

or in terms of its curvelet expansion: 

fj = Yl 

Each frame exhibits a single geometric tendency - either highly nondirectional or highly 
directional - in representing fj. However, fj may have both isotropic and directional fea- 
tures. We therefore seek a combined representation 

fj = Yl ^^^^ + Yl 

Because the combined frame formed by concatenating the two frames is overcomplete, there 
are many possible ways this decomposition can be done. Some of them may be geometrically 
motivated, many are not. 

Consider the following dual-frame Component Separation problem based on ii min- 
imization: 



(CSep) {Wj,Cj) = argmin ll-wlli + ||c||i 

subject to fj = Wj + Cj 

and -UJA = (W^i,V'A), ^eA=^i 
and = (Cj,7^), 77 G A^\ 
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In words, we take a given scale subband fj and decompose it into a wavelet component 
Wj and a curvelet component Cj. The components are chosen by the principle of £i min- 
imization on the frame coefficients: the ii norm of the wavelet coefficients of the wavelet 
component should be small, and the ii norm of the curvelet coefficients of the curvelet 
component should be small. 

Here is our reason for the 'component separation' label: Armed with the optimization 
result at each scale subband, we define the purported pointlike component as the superpo- 
sition of all the wavelet terms: 

j 

and the purported curvelike component as the superposition of all the curvelet terms: 

j 

We obtain the decomposition 

f = P + C. 



1.4.2 Main Result 

At this stage, we have two decompositions: one by the truly geometric pair {V,C) of pointlike 
and curvelike objects and one by the purported geometric pair (P, C). The following result 
justifies our interest in the second pair. To state it, define the scale subbands of the truly 
geometric components by: 

Theorem 1.1 Asymptotic Separation. 

\\w,-v,h + \\c^-qh 



I^.I|2+||C,| 



j oo. (1.6) 



At fine scales, the truly pointlike component is almost all captured by the wavelet com- 
ponent and the truly curvelike component is almost all captured by the curvelet component. 
In short, the purported pointlike and curvelike components deserve the labelling they have 
been given. 



1.5 Extensions 

Theorem 1.1 is amenable to generalizations and extensions. Previewing Section 8.1, we 
mention a few examples. 

• More General Classes of Objects. Theorem 1.1 can be generalized to other situations. 
First, we could consider singularities of different orders. This would allow C to model 
'cartoon' images, where the curvilinear singularities are now the boundaries of the 
pieces for piecewise functions. Second, we can allow smooth perurbations, i.e., 
/ = {J^ -\- C + g) ■ h where g, h are smooth functions of rapid decay at oo. In this 
situation, we let the denominator in (1.6) be simply ||/j||2- 
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Other Frame Pairs. Theorem 1.1 holds without change for many other pairs of frames 
and bases, such as, e.g., orthonormal separable Meyer wavelets and shearlets. 



• Noisy Data. Theorem 1.1 is resilient to noise impact; an image composed of V and C 
with additive 'sufficiently small' noise exhibits the same asymptotic separation. 

• Rate of Convergence. Theorem 1.1 can be accompanied by explicit decay estimates. 

• Other Algorithms and Other Notions of Separation. In the companion paper [20] we 
study thresholding as an alternative approach to separation; it is less computation- 
ally demanding than the £^ minimization studied here, but also somewhat less elegant. 
Building on the estimates proved in this paper, [20] shows that properly-tuned thresh- 
olding can also achieve asymptotic separation. 

1.6 The Multiple-Basis Representation Problem 

Theorem 1.1 should be placed in context of a great deal of ongoing work concerning spar- 
sity and overcomplete representations. Already in the early 1990's, R.R. Coifman became 
interested in the problem of representing discrete-time signals using more than one basis. 
In a conversation, he told one of us about a problem which, in retrospect and using modern 
formulations, can be posed as follows: 

• An observed signal S G R" is thought to be a superposition of subsignals Si, i = 1,2. 

• Each subsignal Si is thought to be 'coherent' in an 'appropriate' basis ^i, i = 1,2. 

• Each subsignal 'looks incoherent' in an 'inappropriate' basis. Here <1>2 is inappropriate 
for ^i, and $i is inappropriate for ^2. 

Coifman, Wickerhauser and co-workers at the time made a sort of heuristic exploration 
motivated intuitively by these slogans. As a published example of their work at the time, 
please see [12, Fig. 26(a-h)]. The different 'coherent parts' displayed in those figures were 
obtained by the following recipe: 

1. Transform signal S into basis 

2. Threshold the coefficients, yielding sparse coefficients ai. 

3. Form residual R = S — ^lai. 

4. Transform R into basis <I>2. 

5. Threshold the coefficients, yielding sparse coefficients a2. 

6. Write Si = ^iCXi] then 

S = Si + S2 + residual. (1.7) 
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At about the same time, Stephane Mallat and Zhifeng Zhang became interested in the 
problem of representing signals using a highly overcomplete dictionary of time-frequency 
atoms; [34] (their dictionary had ~ log(A^) different frames, where N is the signal length). 
Their approach, called Matching Pursuit, built up an approximation one-term-at-a-time it- 
eratively, at each stage finding the best single atom in any of the several bases which was not 
yet already forming part of the approximation and adding that term to the approximation. 

Lurking in these early numerical experiments were two larger questions. If there truly 
is a simple representation of the signal using more than one system, can it ever be found? 
Can it be found by such a simple approach? 

Formally: can one accurately recover 'coherent' pieces and ^2 given knowledge of 
S = Si + S2 only? For example, can we expect that the outputs ^i, 5*2 in (1.7) obey 
5*1 ~ 5*1 and 52 ~ 5*2? Researchers at the time said in conversation, that, when put this 
starkly, the answer was simply 'no', since there are twice as many unknowns as knowns. 
Nevertheless, some of the empirical results at the time were suggestive and inspiring. 

1.7 Minimum £1 Decomposition and Perfect Separation 

A few years later, one of us worked with Scott Shaobing Chen to develop a formal, optimiza- 
tion-based approach to the multiple-basis representation problem. Given bases i = 1,2, 
one solves the following problem 

(BP) min ||ai||i -|- ||a2||i subject to 5 = <I>iai + $202- 

Here || • ||i denotes the usual ii norm. Note that here there are 2n unknowns in ai and 
a2 and only n knowns in S, but that an optimization principle is being used to select a 
particular element from the n-dimensional space of all possible solutions. (Terminological 
note: the name 'Basis Pursuit' is meant to remind the reader that (BP) actually selects a 
basis for the solution out of the many conceivable bases which can be extracted from the 
union of the two overcomplete systems). 

Based on earlier experience of our first-named author and his collaborators, see [19, 
22, 23], it was known that the ii norm had a tendency to find sparse solutions when they 
exist. And indeed, Chen's thesis showed that in some simple special cases that this was so. 
Letting $1 be the standard basis of R" (i.e., Kronecker sequences or 'spikes') and $2 be 
the Fourier basis, Chen considered signals S which were superpositions of two spikes and 
two sinusoids. He showed that (BP) recovered exactly the indices and coefficients of the 
terms involved in the synthesis; and that this was true across a wide range of amplitude 
ratios between the sinusoid and spike components. In short, there was perfect separation of 
sinusoids from spikes, and the true underlying simplicity of the signal was revealed - even 
though there were more unknowns than equations. 

In the years since that work, two streams of research emerged. 

• Theoretical work, showing that, indeed, one could in certain settings obtain the spars- 
est possible representations to an underdetermined problem by £1 optimization; see, 
e.g., [9, 13, 14, 15, 17, 45] for a selection of general work concerning £1 minimization, 
and [2, 16, 18, 24, 27] for work somewhat relevant to component separation. 
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• Empirical work, showing that combined representations such as wavelets with curvelets 
or wavelets with sinusoids often gave very compelling separations of real signals and 
images, see, for instance, [1, 10, 25, 26, 35, 42, 40, 41, 43, 44, 30, 47]. 

We have already mentioned the empirical successes of Starck and collaborators. For an 
overview of much recent work on sparse decompositions, see [3]. Note that geometric sep- 
aration is somewhat different from the task of separation of texture from smooth structure; 
in that problem, sparsity in frame expansions does not play an explicit role, nor do the 
geometric considerations which are so important here; very interesting early work in such 
non-geometric separation was published by Yves Meyer [36], and Vese and Osher [37]. 

1.8 Theorem 1.1 in Context, and Outline of Paper 

We can now place our result in context, via several comparisons and contrasts, looking 
ahead to themes developed below. 

• Microlocal Viewpoint. In Theorem 1.1 the objects of interest are collections of point 
and curve singularities. The viewpoint derives from microlocal analysis (see Sec- 
tion 3), which says that points and curves are very different objects in their joint 
space/orientation structure, so that even if they happen to overlap spatially, they are 
microlocally distinct. In contrast, other work on sparsity and li minimization typ- 
ically has a discrete flavor, making hypotheses about the number of nonzeros in an 
expansion and assuming the dictionary elements interact randomly. 

• Microlocal Asymptotics. Asymptotics are important for Theorem 1.1; the sharp sepa- 
ration between curves and points in microlocal phase space exists only as a limit phe- 
nomenon, as the scale tends to zero. Asymptotic statements are important in other 
literature on sparsity-driven decompositions, but they are asymptotic in the number 
of random elements in the underlying matrix, and exploit law-of-large-numbers and 
concentration-of-measure effects. For Theorem 1.1, such principles play no role. 

• Clustered Sparsity. In other work on sparsity-driven decompositions, sparsity of the 
coefficients plays a role primarily through the number of nonzeros. In this work 
sparsity plays a role also through the arrangement of nonzeros; Section 2.1 introduces a 
notion of clustered sparsity and Sections 4-7 develop estimates bounding the locations 
of significant nonzeros in the wavelet expansion of a point singularity or in the curvelet 
expansion of a curve singularity; the estimates will be organized using micolocal phase 
space ideas described in Section 3. 

• Cluster Coherence. In other work on sparsity-driven decompositions, coherence or 
restricted isometry principles play a role; these don't depend on the arrangement 
of nonzeros in an expansion. Moreover, these are often applied to random-dictionary 
situations where the interaction between frame elements is random and quasi arbitrary. 
Section 2.1 develops the notion of cluster coherence which specifically depends on the 
arrangement of nonzeros. We apply this notion to a dictionary (wavelets + curvelets) 
where interactions are geometrically driven, and we develop estimates motivated by 
microlocal analysis which provide the needed geometrical information. 
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Our paper begins with Sections 2 and 3 introducing the driving ideas of clustered spar- 
sity, cluster coherence, and microlocal separation, and describing a plan to prove Theorem 
1.1 by establishing several needed estimates. Later Sections 4, 5, 6, and 7 then develop 
these estimates, by developing results about wavelet and curvelet expansions of point and 
line singularities. Section 8 then mentions a number of possible extensions. 



2 Component Separation by ii Minimization 

We now study the behavior of minimization in the two-frame case. Our analysis centers 
on the use of cluster coherence to control joint concentration. 



2.1 ii Minimization for Separation of Two Tight Frames 

Suppose we have two tight frames ^i, $2 in a Hilbert space H, and a signal vector S £ 7i. 
We know a priori that there exists a decomposition 

S = S^ + S^, 

where Si is sparse in Frame 1, and S2 is sparsely represented in Frame 2. 
Consider the following optimization problem 

(Sep) (^^,^2) = argmin^^ 5^11 <5f5i 111 + ||$^52||i subject to S = Si + S2. (2.1) 

The optimization problem (Sep) is visibly similar to, but subtly different from, (BP). 
Here the £1 norm is being applied on the analysis coefficients of the two different 'com- 
ponents' rather than on the individual synthesis coefficients. The hope in (BP) is to get 
exactly the right nonzero coefficients in the sense of those providing the sparsest representa- 
tion. However, this can become numerically unstable for certain tight frames. The hope in 
(Sep) is merely to separate components rather than the more ambitious goal of identifying 
the true nonzero coefficients within each component's representation. Starck and Elad have 
found this distinction to be important in their own empirical work on separation. 

To analyze this we need the following notion. 

Definition 2.1 Let <l>i and <1>2 be two tight frames. Given two sets of coefficients Si and 
S2, define the joint concentration k = k{Si,S2) by 

In words, we consider the maximal fraction of total ii norm which can be concentrated to the 
combined index set 5i U52- Concepts of this kind go back to [23]. Adequate control of joint 
concentration ensures that the principle (2.1) gives a successful approximate separation. 

Proposition 2.1 Suppose that S can be decomposed as S = S1 + S2 so that each component 
5? is relatively sparse in ^i, i = 1,2, i.e., 

||l5J^fS?||i + ||l5g$^52°||l <5. 



11 



Let {Sl,S^) solve (2.1). Then 



lie* C0|| I lie* C0|| ^ ^ 
\\^\ — '-'I II 2 + II '-'2 ~ '-'2 l|2 S 

Definition 2.2 Given tight frames $ = ((/>j)i and ^ = {ipj)j and an index subset S associ- 
ated with expansions in frame we define the cluster coherence 

piciS,<^;^) =ma^y2\{(|)i,^Pj)\. 

3 — 

In many studies of £i optimization, one utilizes instead the mutual coherence 

;u(<I>,^') = maxmax|((/»j,'i/'i)|, (2.2) 
j i 

whose importance was shown by [18]. This may be called the singleton coherence. In 
contrast, cluster coherence bounds coherence between a single member of frame ^ and a 
cluster of members of frame clustered at S. 

A related notion called 'cumulative coherence' was introduced in [45]; that notion max- 
imizes over subsets 5 of a given size, whereas here we fix a specific set S of coefficients. In 
applying our concept, the index subsets we will consider are not abstract, but will instead 
have a specific geometric interpretation, associated to proximity to certain curves in phase 
space. Maximizing over all subsets of a given size would give very loose bounds, and would 
not be suitable for our purposes. Several other coherence measures involving subsets appear 
in the literature, e.g., [4] and [46], but we do not see a strong relation to cluster coherence. 

Lemma 2.1 We have 

k(5i,cS2) < max{/ic(5i,^>i;^>2),Aic(52,^>2;^>i)}- 

The proofs for Proposition 2.1 and Lemma 2.1 are presented in Section 9.1. 



2.2 Intended Application 

The concepts of this section will now be applied to (CSep), at scale j only. With f = V + C 
our distribution of interest, and Fj our bandpass filter, and we set fj = Fj-k f . Throughout 
this section, the object S = fj and the tight frames are $i, the full radial wavelet frame, 
and <^2> the full curvelet tight frame. We apply the optimization problem (Sep), getting 
subsignal components 5* and 5*1, which we then relabel as the wavelet component Wj and 
curvelet component Cj; one should check that with this sequence of substitutions, prohlem 
(Sep) of this section becomes (CSep) of the introduction. 

The key problem in the application of Proposition 2.1 to the aforementioned setting is 
the correct choice of the clusters of significant coefficients at each scale. If those clusters are 
chosen 'too small asymptotically', the relative sparsity will blow up, and if chosen 'too large 
asymptotically', we lose control of the cluster coherence. We define those clusters on the 
ideal decomposition, where wavelets are used to analyze the point singularity and curvelets 
are used to analyze the curve singularity. (Such clusters are theoretical, non-observable 
entities.) 
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For a series of wavelet-coefficient thresholds ej^i to be specified, the cluster of significant 
wavelet coefficients can be provisionally defined as 

= {A G Af : \{^l^x,m > • ||((V'A',7',))a'||,^(a±1)}- (2-3) 

For a series of curvelet-coefficient thresholds ej^2 to be specified, the cluster of significant 
curvelet coefficients can be provisionally taken as 

S2,j = {r? G Af : |(7^,C,)| > 6^,2 • ll((7r,',C,) V||£^(a±1)}- (2-4) 

Each threshold choice picks a specific point on the tradeoff between relative sparsity and 
cluster coherence. 

Then 6j will denote the degree of approximation by significant coefficients, the sum 
6j = 6j^i + 5j^2 of the wavelet approximation error to the point singularity: 

and the curvelet approximation error to the curvilinear singularity: 

Finally, k{Sij,S2j), the degree of joint wavelet-curvelet concentration at the significant 
subsets, will be controlled by two cluster coherences: fid's! j, ^"2) the maximal coherence 
of a curvelet to a cluster of significant wavelet coefficients; and fici'S2,ji ^2] ^1) the maximal 
coherence of a wavelet to a cluster of significant curvelet coefficients. We have 

Corollary 2.1 Suppose that the sequence of transform- space clusters (Sij), and {S2j) has 
both of the following two properties: (i) asymptotically negligible cluster coherences: 

Hc{Si,j, «>i; $2), l^c{S2,j, '^2; ^1) ^0, j 00, 
and (ii) asymptotically negligible cluster approximation errors: 

^3 = '^ij + ^-ij = odi/jib), J oo- 
Then we have asymptotically near-perfect separation: 

^0, J ^00. 

l|Jil|2 

The main result -Theorem 1.1 - follows from this lemma, but this will require sufficiently 
good estimates for cluster coherence for clusters defined as sufficiently good approximations 
to the objects of interest. 

We remark that although the threshold in the provisional definition of the clusters in 
(2.3)-(2.4) provides a means to balance between relative sparsity and cluster coherence, 
there is no a priori guarantee that there exists a threshold for which conditions (i) and (ii) 
of Corollary 2.1 are true. The main achievement of this paper is to show that this is indeed 
possible with a revised definition. In fact we do not finally define the clusters by (2.3)-(2.4); 
note that the clusters must simply exhibit properties (i) and (ii) of the Corollary. We intend 
to make use of our freedom of definition in order to get the needed properties. 



13 



sing supp(C) 




sing supp(P) 



Figure 1: Singular supports of the point singularity V and the curvilinear singularity C. 
The two supports overlap in one point. 



3 Microlocal Analysis Viewpoint 

The proof of the main result boils down to defining clusters and bounding cluster coherences 
and cluster approximation errors; our approach is inspired by microlocal analysis. In effect, 
we consider the case where the cluster is either a string of curvelet coefficients in the cone of 
influence of a curvilinear singularity, or a block of wavelet coefficients in the cone of influence 
of a point singularity, and we must bound interactions between clusters in one frame and 
elements in the other frame. Microlocal analysis provides a simple organizational framework 
that immediately suggests which interactions 'ought' to be small and large, based on the 
geometry of the overlaps between phase portraits in the underlying microlocal phase space. 



3.1 Microlocal Analysis Concepts 

The singular support of a distribution /, sing supp(/), is the set of points where / is not 
locally C°°. In the geometric separation setting, we have 

sing supp(/) = sing supp('P + C) = sing supp('P) U sing supp(C) = {xi} U image{T) 

because we have constructed the distributions V and C so their singularities have this form. 

Note that the points Xi can intersect the image of the curve r - we make no separation 
hypothesis asking the point singularities to 'stay away' from the curvilinear singularities. 
Figure 1 displays the singular support of / and the contributions from V and C. 

To properly separate between pointlike and curvelike singularities we need to consider 
a phase space for microlocal analysis indexed by position-orientation pairs {b,6); such pairs 
can describe the locations and orientations where / has singular behavior. The orientational 
component will be regarded as an element in P^, the real projective space in R2. (Here we 
identify with [0, vr) and freely write one or the other in what follows. It may at first seem 
more natural to think of directions [0, 27r) rather than orientations [0, vr), note however that 
in this paper we consider real-valued distributions V + C measured by real- valued curvelets 
'Jn so directions are not resolvable, only orientations. We also frequently abuse notation as 
follows: we will write \6 — 6'\ when what is actually meant is geodesic distance between two 
points on P^.) 
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Figure 2: Left panel: singular supports of V and C (compare Figure 1). Right panel: 
wavefront sets of V and C are indicated by blue and red tubes in {bi,b2,6) space. The 
singular support of C is also indicated; it is the 61-62 projection of the wavefront set. Due 
to the additional dimension, the wavefront sets are more distinctly separated than the 
corresponding singular supports. 

Living in this phase space is the wavefront set WF{f); roughly, this is the set of position- 
orientation pairs at which / is nonsmooth; for more details, see: [29, 7, 31]. 
Under the geometric separation model of Section 1, we have 

WF{V) =supp(P) X pi, 

since a point singularity is singular in all directions on its singular support, and is singular 
nowhere else; while 

WF{C) = {{T{t),9{t)):te[0,L{T)]} 

where r(t) is a unit-speed parametrization of C and 9{t) is the normal direction to C at r(i) 
regarded in P^. 

It is convenient to think of the parameter space for microlocal analysis as a plane of 
positions 6 lying beneath a third dimension of orientations 9. Then the wavefront set of a 
point singularity V concentrated on a single point xi is a vertical line segment {xi} x P^, 
corresponding to singular behavior in every direction at a given point, while the wavefront 
set of C is a more general curve in phase space. Even if xi meets image{T), so the singular 
support of the point singularity and a curvilinear singularity overlap at xi, they behave 
quite differently as wavefront sets in the full 3D parameter space, which gives us hope for 
separation. 

Figure 2 illustrates the 3D phase space with pointlike and curvelike singularities super- 
posed. 

3.2 Support of Frame Elements 

A radial wavelet ipx, A = (oq, 60) is 'morally' supported in x-space in a spatial ball B{aQ, bo), 
defined by 

^(ao,6o) = {x :\x- bo\/a < 1}; 
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this statement should not however be taken too hterahy, as the radial wavelets we study 
are not of compact support. The more precise statement is that the wavelet decays rapidly 
in the variable \x — 60 1 The next statement uses the notation 

(a) = (l + aY/^. 

Lemma 3.1 For each N = 1, 2, . . . there is a constant c^ so that 

\'^a,b{x)\ < CN ■ a"^ • {\x - b\/a)~^, Va G R+ V6,x G R^. 

An individual curvelet 7^, 7] = (ao>^0)^o) is 'morally' supported in x-space inside 
an anisotropic spatial ellipse. To make this precise, let -Di/a be the diagonal matrix 
diag{l/a,l/y/a) and R^e denote planar rotation by —9 radians. Let 

Pa,e = Di/aR^d 

denote the parabolic directional dilation operator which dilates much more strongly in the 
9 direction than in the orthogonal direction. For a vector v G define the norm 

\v\a,e = \Pa,e{v)\; 

the unit ball in this norm is ellipsoidal, with minor axis pointing in direction 9. A curvelet 
is morally supported in the ellipse E{aQ, bo, 9^) defined by 

E{ao, 60, 6*0) = {x -.{x- 6o|ao,eo ^ 1}' 

Again the correct formal statement is that there is rapid decay in the variable \x — b\afi- 
The following is proved in [7, Lemma 1, page 168]. 

Lemma 3.2 For each N = 1,2,... there is a constant cm so that 

l7a,M(^)| <C7va-3/4.(|x-6U,e)"'^, VaGR+V0G [0,7r) V6,xGR2. (3.1) 

In Figure 3, we visualize these support relationships. 



3.3 Phase-Space Support of Frame Elements 

We also study the location-orientation behavior of frame elements, i.e., the attribution of 
regions in an orientation/location domain as regions of significant activity in a distribution 
/. The wavefront set gives a qualitative way to do this; we use the continuous curvelet 
transform (CCT) to do so quantitatively. This transform is defined by 

Tf{a,b,9) = {-/a,b,e,f) 

and is indexed by triples {a,b,9), where a > 0, b £ H?, and 9 G [0,7r); this associates a 
function / to a scale/location/direction domain. There is a natural measure on this domain: 

m{da, db, d9) = a^^ ■ da ■ db ■ d9. 
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SUpp(7a, 




SUpp(V'a 



Figure 3: Effective supports of wavelets and curvelets at different scales. The shapes of 
those supports become increasingly distinct with increasingly fine scales. 



Consider a high-pass function /, i.e., one whose Fourier transform of / vanishes near the 
origin; /(^) = 0, |^| < vr/oo, see [7]. The CCT offers a Parseval relationship for high-pass 
functions: 



2_/j„ jo\ _ II ^Il2 



\Tf{a,b,e)\''m{da,db, dO) 



2) 



see [8] . Hence the energy in / is distributed through the scale- location-direction domain by 
the curvelet transform offering a portrait of the function's significant activity. 
Consider the transform of a radial wavelet: 

rVao,6o(«'^'^) = (^a,f':e,^ao,6o)' a > 0, 6gR^, 6'G[0,7r). 

Because V'ao,bo radial, F^^^ ^^{a,b,9) is constant, independent of 6, and decays rapidly in 
variables | log2(a/ao)| and \b — bol/aQ. It is morally localized to a cell of the form 

W{ao,bo) = {{a,b,e) : |log2(a/ao)| <l,9e [0, vr), |6 - 6o|/ao < 1}; 

we have the following formal statement, proved in Subsection 9.2.1 below: 

Lemma 3.3 For each N = 1,2, ... , there is a constant so that 

K7a,b,0,^ao,6o)l < CAT • O^/^ • 1{| log2(a/ao)|<3} " (I& " ^o|a,e)~^- 

It implies the following for a scale-conditional phase portrait of a wavelet (i.e., we freeze 
the analysis scale a at a specific value, and inspect F^^^ (a, b, 6) as a function of variables 
b and 6): when freezing a = oq, we see that F is 'morally' supported in a vertical tube above 
the point b^, each horizontal cross-section is a ball of width cq, i.e., B{aQ,bQ). 

Consider now the transform of a curvelet: 

r7ao.fto.«o("'^'^) = (7a,6,0,7ao,6o,eo>, a > 0, 6 G R^, 6* G [0,7r). 
This is 'morally' localized to a cell of the following form: 

Q{ao, bo, 9o) = {{a, b,e) :\ log2(a/ao)| < l,\0 - Oo] < ^/ai^, \b - feoUoA ^ ^i; 
the correct formal statement being [see (21) in [8]] 
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Figure 4: Left panel: effective supports of wavelets (blue) and curvelets (red); compare 
Figure 3. Right Panel: Phase space portrait - obtained by CCT - of the same wavelets 
and curvelets. Effective supports of curvelets are indicated in the (61,62) plane. The 
portrait conveys the intuition that while the effective support of wavelets and curvelets 
may overlap substantially in the spatial domain, in the full phase space, the relative overlap 
is significantly reduced. 



Lemma 3.4 For each N = 1,2, ... , there is a constant so that 

K7a,b,0,7ao,feo,9o)l < CAT • 1{| log2(a/ao)|<3} " '^{\e^eQ\<W^} ' (1 6 - 60 1 ao,eo ) ■ 

(We remind the reader of our convention that, for two points 9,9' G [0, vr), \9 — 6'\ really 
means geodesic distance in P^.) 

Thus each curvelet is supported in scale, location, direction in a set which effectively 
has a product structure, and is compactly supported in both scale and orientation. In a 
scale-conditional phase portrait, freezing a = oq, we see a vaguely ellipsoidal structure with 
slice 9 = 9q exhibiting an anisotropic footprint, like -E(ao, 6o, ^o)- 

In Figure 4 we visualize the scale-conditional portraits of a wavelet and a curvelet. The 
intuition to be fostered from these figures is that curvelets don't interact very heavily with 
wavelets, because they have such different support in {b,9), even when they have the same 
scale and location parameters, oq and 69. 

This support structure of curvelets is notable for other reasons. The support structure 
is implicit in the natural measure: 



m{da, db, d9) = a ^ ■ da ■ db ■ d9 
da db d9 



< a < ao,6 G R^ 9 ^ P^ 



a a^l'^ a^/^ ' 

The second expression has the following interpretation. We assign roughly unit measure to 
curvelet cells 

m(Q(a, 6, 6*)) « 1, < a < oq, beY{?,9e P^ 

there are about a~'^/^ locations per unit volume at a fixed scale and orientation and about 
a~^/'^ orientations at a fixed scale and location. 
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The support structure is mimicked by the discretization of the curvelet tight frame, 
which morahy breaks the (a, b, 9) domain into disjoint cells Q{aj, bj^k,e, and samples the 
continuous transform once per cell; for details see [8]. 

3.4 CCT of Singularities 

The CCT can be used to analyze the singularities V and C directly. Let's impose regularity 
conditions on C. 

Let the Hausdorff pseudo-distance d between b and a curve r be defined by 

d(o, b, 9; r) = min{|x — b\a^e ■ x G image{T)} . 

Definition 3.1 A finite-length planar curve r will be called parabolically regular, if, for 
N = 1,2, . . . , there is a constant cn so that for a G (0, 1) and all b, 9, 

I {\r{t) - b\a,e)-''dt < CN ■ . (^d{a, b, 9; r))"^. (3.2) 

Traditional nice curves, such as line segments, circles, etc. are parabolically regular, 
though we skip the demonstration. 

Lemma 3.5 Let the singularity C be defined as in (1.2) by a parabolically regular curve r. 
Then, for each N = 1,2,..., 

|rc(a, b,9)\<CM- a-^l^ ■ {d{{a, b, 9); r))"^. 

In words, the CCT may be large at phase space points close to r, but elsewhere it is 
very small. 

Proof. The definition of C as a linear functional on the Schwartz space gives 

Tc{a,b,9) = j ^a,b,eir{t))dt. 
The estimates (3.1) and (3.2) give 

\rc{a, b,9)\ < j CN- • (|r(i) - b\a,e)-'' dt 

< CN ■ a-^/^ • C'j^ ■ a^/^ • {d{a, b, 9; r))"^ 
= C'^-a-^/^-{dia,b,9;T)r''.a 

3.5 Heuristics 

We are now in a position to give a heuristic explanation why the strategy announced in 
Section 2.2 is likely to work. In effect, there are very instructive analogies between the 
calculations needed to implement that strategy and the behavior of certain 'tubes' in phase 
space. The reader will have noticed that the curvelet parameter (a, b, 9) and the wavefront 
set parameter (6, 9) differ only by the latter's provision of a scale. Hence there is some 
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analogy between the scale-conditional portrait by CCT and the wavefront set - both provide 
measures of the activity of an object, indexed by location and orientation. 

In effect, the scale-conditional portrait by CCT is a "thickened-out" version of the 
wavefront set. A point singularity has a wavefront set which is a vertical line in phase space, 
and a scale-conditional portrait which is localized near a thin vertical tube. A curvilinear 
singularity has a wavefront set which is a curve in phase space, while Lemma 3.5 says that, 
morally, the curvelet transform of the object C 'lives' near a tube. The tube in question has 
thickness ~ a. Look at the scale-conditional portrait, and define the tube 

T{ao) = \J{E{ao,b,9) x {9}), 
b,e 

where the union is over 6, 9 satisfying 

d{{ao,b,9)-T) < 1. 

As ao — )• 0, this tube shrinks down to a curve f in phase space x defined by 

f{t) = {T{t),6{t)), 

where 9{t) is the orientation of the normal to T{t). In short, in the sense of set convergence 

T{ao)^WF{C), ao^O. 

Thus, the wavefront set and the curvelet transform both signal that the activity in location- 
orientation space is concentrated near image{f). 

More is true. A wavelet has a scale-conditional portrait which is a thin vertical tube - 
similar to the phase portrait of a point singularity - while a curvelet has a scale-conditional 
portrait which is a tube surrounding a little 'piece of a curve' in phase space, i.e., it morally 
has a position and orientation. 

This visual analogy suggests that curvelets are incoherent to wavelets - because of the 
low overlap in phase space. Indeed, from Parseval, 

^i'ao,bo («' ^' ^)r7ai.6i,9i ^ 9)m{da, db, dff) = (^ao.fec 7ai,fei,ei), 

and so the low overlap between the two phase portraits indeed will cause relatively low 
singleton coherence (2.2); indeed the tubelet associated to a given curvelet and the tube 
associated to a given wavelet visibly have relatively small overlap in the scale-conditional 
phase portrait; for example, if we compare the overlap of effective supports in phase space 
to the overlap of effective supports in the spatial domain, we see that the fractional overlap 
is dramatically smaller at fine scales in the phase space portrait than it is in the spatial 
domain portrait. 

However, the singleton coherence is not sufficiently small to be powerful in the present 
setting. Instead, this paper develops cluster coherence. The visual analogy presented in 
Figure 5 suggests how to bound the cluster coherence and suggests that the proof strategy 
of Section 2.2 will succeed. To understand that analogy, let's study Figure 5. If we let Si 
denote the set of significant wavelet coefficients in the radial wavelet transform of V at scale 
ao, and ^2 denote the set of significant curvelet coefficients in the curvelet transform of C 
at scale ao, we believe the reader will be easily able to motivate the following assertions on 
the basis of Figures 3-5: 
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• Wavelets in Si are associated to vertical tubes clustering around the point singularities 
in V; 

• Curvelets in 52 are associated with tubes clustering around the curvilinear phase 
portrait of C; 

• No single wavelet's phase portrait overlaps much with the cluster of curvelet phase 
portraits of S2; 

• No single curvelet phase portrait overlaps with the cluster of wavelets in Si. 

Let's outline a pseudo-calculation inspired by these visual observations. First, we con- 
sider a pseudo-calculation of the cluster coherence, 



/ic(c5i,j, Wavelet scale j; Curvelet scale j) = sup ^ 1(7^, Va)|- 
With (V'i)* enumeration of the significant wavelets in the expansion at scale aj = , 

i i 

I I I \^,,\{a,b,9)\T^M(^,b,9)dm 




< 



< 



III \T,^\{a,b,9)Y,\^^^\{a,b,e)dm. 



Now use the bounds on |r^. | (a, b, 9) given above in Lemma 3.3, and deploy the slogan that 
only a bounded number of significant wavelets at any given scale interact strongly with any 
specific phase space point; we have 

Y,\^^S{aAe)<Ci-a]'\ 

i 

where the sum is over the significant wavelets at scale aj and Ci does not depend on j, and 
the symbol < indicates an inequality motivated heuristically - in this case by the preceding 
italicized slogan. We also observe that Lemma 3.4 implies that the integral over phase 
space of a curvelet phase portrait obeys J J J \T^^\{a,b,9)dm < C2. Combining this with 
the previous displays, we pseudo-conclude that 

/ic Wavelet scale j; Curvelet scale j) — )• 0, j — )• 00. 

Next, we consider a pseudo-calculation of the cluster coherence 

(52, Curvelet scale j; Wavelet scale j) = sup ^ KTjiV'a)!- 
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With (7j)j an enumeration of the significant curvelets in the expansion at scale aj 
then for a fixed wavelet index A we have 



2-^ 



= E 

i 



\T^^\{a,b,e)\V^^\{a,h,d)dm 



< 



j j j \T^,J{a,b,d)^\T^^\{a,b,d)dm. 



Now use the bounds on \T^.\{a,b,6) given above in Lemma 3.4, and apply the slogan that 
only a bounded number of significant curvelets at any given scale interact strongly with a 
given phase space point. We have 



E 



r,,|(<i,6,e)<c„ 



where the sum is over the significant curvelets at scale aj and Ci is a constant. We also 
observe that Lemma 3.3 implies that the integral over phase space of a wavelet phase 



portrait obeys f f J \r^J{a,b,6)dm < C2 • 
the previous displays, we pseudo-conclude that 



1/4 



where oq = ao(A). Combining this with 



/ic(52, CURVELET SCALE j; WAVELET SCALE j) — > 0, 



J 



oo. 



We now turn to the approximation tasks posed by the strategy announced in Section 2.2. 
To pseudo-bound 6j^i, we fix e > and define the tube in phase space Tj^i consisting of all 
scale/location pairs (a, b) where the bound provided in Lemma 4.1 permits coefficients larger 
than Also, we let ij denote the index in the wavelet enumeration beyond which such 

potentially significant coefficients can no longer arise. We can heuristically approximate 
a sum of wavelet coefficients with an integral over the phase space region covered by the 
union of their phase space supports; then we have 



i>i-i 



1/2 



{b/aj 



dm. 



For sufficiently large N, the integrand has powerful decay; for large j, the width of the tube 
Tj^i is significantly wider than the decay scale aj; and so the integral becomes negligible 
for large j, i.e., we pseudo-conclude that 6j^i — 0. To pseudo-bound 6j^2 we fix e > and 
define the tube in phase space 75-^2 consisting of all triples (a, b, 6) where the bound provided 
in Lemma 3.5 permits coefficients larger than a~^. Also we let ij denote the index in the 
curvelet enumeration beyond which such potentially significant coefficients can no longer 
arise; again heuristically identifying a sum with a phase-space integral we have 



E 

i>i-i 



< 



CN ■ 



-1/4 



{d{aj,b,e;T))-^dm 



n-c 
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For large the integrand decays strongly; for large j the width of the tube is significantly 
wider than the decay scale a^-; and so again the integral becomes negligible for large j, i.e., 
we pseudo-conclude that 6j^2 — ^ 0. 

In short, phase space diagrams, some elementary estimates motivated by tube overlaps, 
and some cardinality 'slogans' combine to show plausibility of the strategy announced in 
Section 2.2. In the sections to come, we rigorously carry out that strategy. While the details 
are much more delicate than this plausibility argument would suggest, the architecture of 
our full demonstration remains faithful to the geometric viewpoint. 
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Figure 5: Phase space portrait of a cluster of curvelets and one single wavelet. Note the 
visually small overlap of these two geometrical objects. This suggests by analogy that each 
single wavelet is not coherent with anything built from such a cluster of curvelets. 



4 The Cluster Sij and its estimates 

In this section, we define the cluster of wavelet coefficients Sij of the filtered point singu- 
larity Fj -k V, and estimate relative sparsity as well as cluster coherence using this cluster. 
We intend to show that with this definition of cluster set, 

= oiWfjh), i^oo, (4.1) 

and 

/Xe(5lj, {V-a}; {irj}) ^0, OO. (4.2) 

As explained in Section 2.2, this gives the needed part of Theorem 1.1 having to do with 
WLOG we can assume that 

P = |x|-3/2. 

The result for the more general V of (1.1) follows easily by combining translation invariance 
with finitely many uses of the triangle inequality. We also, from now on, fix some 

e E (0,1/32). 
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Our first lemma is used frequently in what follows, and is crucial for our definition of 
the cluster of wavelet coefficients. For the proof, see Subsection 9.3.1. 

Lemma 4.1 For each N = 1,2,..., there is a constant cj\f so that 

\{''Pa^„b,rj)\ < CN ■ 2^'/2 • l{|j_jv|<i} . (|6/a,r^, Vj,/ G Z+, V6 G R^. 

In line with the heuristics of the previous section, we think of our estimate as describing 
relative overlaps of tubes in phase space. However, in the particular case of wavelets, there 
is no directional selectivity, so all that matters is the projection of phase space onto the 
spatial domain. We measure spatial distances with 

(i2(a;, ^) = niin ||x — a||2, x G R^, ^ C R^, 

the Euclidean distance between a point x and a set A. 

Of course, since we are dealing with frames, ultimately we have to consider discrete 
indices. To support geometric intuition, most of our arguments will be in the continuum 
setting, restrictions to discrete sampling grids being delayed as late in each argument as 
possible. 

Morally, the points in phase space associated with significant wavelet coefficients are 
contained in a tube around WF{V) in phase space. This neighborhood of WFlV) can be 
explicitly defined by 

AAf^(a) = {6 G R2 : d2{b,{0}) < Di{a)} x [0,7r), 

where 

Di{a) = a^^~'\ 

The shape of the tube reflects the isotropic behavior of WF{'P). For an illustration of 
Aff^{a), we refer to Figure 6. 

We define the cluster of wavelet coefficients around the point-singularity by intersecting 
the tube Aff^{a) with the wavelet lattice, i.e., 

= {ij,k) G Af : X [0,7r) G AAf^(a,)}. 

The remainder of the section establishes (4.1)-(4.2) for this definition of cluster. 

4.1 Size of fj 

Lemma 4.2 For some c> 0, 

||/il|2>c2^'/', j^oo. 

Proof. Apply (1.3) and (1.4): 

WfjWl = [ w\\c\/2^)\m\^d^>c- [ \(r'dc>c-2^. □ 

JR2 JAi 
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4.2 Sij offers low approximation error 

Now we are ready to state and prove the approximation error of Sij. 

Lemma 4.3 

Sl,J= E KV'A,7',)|=0(||/,||2), j^OO. 

Proof. Due to the specific filtering we use, 

Ae5f,, j'=j-l {k:{b^,,,}x[0,n)mrHa^')} 

Applying Lemma 4.1, and picking N so large that {N — l)e > 1/2 

Si,j<c- E CN-2^/^-{\k\)-^ = o{l), j>jo. 

\k\>2i^ 

So the lemma is proved. □ 

4.3 iSi j offers low cluster coherence 
Lemma 4.4 

/ic(5ij, {tpx}; {7r?}) -^0, i oo. 
Proof. By Lemma 3.3, if A E 

\{^l^x,lr,)\<c■2-^/\ Vr/. 
The definition of Sij via Af['^{aj) implies that 

i^i'Sij) < c • 2^^' , for sufficiently large j. 
We conclude from e < 1/8 that 

sup |(VA,7r,>l <C-2-^W4-2-)^0, i^OO. □ 

5 Sparse Expansion of a Linear Singularity 

After the relative ease with which we obtained concentration estimates (4.1)-(4.2) for the 
cluster of significant wavelet coefficients, we must now brace ourselves for the considerably 
harder challenge posed by the analogous estimates for the cluster of significant curvelet 
coefficients. This extra work seems, at least to us, much more rewarding, as it involves a 
full-blown use of phase space geometry. 

In this section, we develop essential infrastructure for the analysis to come, documenting 
the sparsity of curvelet coefficients of a special linear singularity. Let W2 R ^ [0, 1] be a 
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Figure 6: The tubes Aff^ia) and Af2^{a) in phase space. The clusters of significant coeffi- 
cients correspond roughly to phase-space support regions overlapping these tubes. 

smooth function to be specified later (cf. Subsection 6.2), supported in [—1, 1], and define 
the very special distribution wC supported on a line segment {0} x [— p, p] by 

wC = W2{x2/p) ■ 6o{xi). 

Then we can write 

wC = w -k C, 

where 

w = W2{pC2) ■ P ■ SoiCi) and £ = 5o(6)- 
Thus the action of wC on a continuous function / is given by 

27r{wC,f) = {£,w*f) = y"(u; */) (6, OKi- (5.1) 

Conceptually, wC is a straight curve fragment; our analysis of C in Section 6 will reduce to 
the study of this case. 

Define a tube in phase space, in which the significant curvelet coefficients will be located. 
This will now be a neighborhood of WF{wC), defined by 

AAf ^(a) = G r2 : d2{b, {0} x [-2p, 2p]) < D2{a)} x [0, V^], 

where 

For an illustration of J\f2'^{a), and its relation to Mf^la), we refer to Figure 6. The actual 
definition of the cluster of curvelet coefficients is much more involved. In Lemma 5.4, we 
will introduce a first set which helps to determine its location. 

Several bounds will control the curvelet coefficients of a linear singularity. Lemma 3.5 
gives 

\{wC,ja,b,e)\<c-2^/\ ya,b,e- (5.2) 
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in fact that lemma even gives a decay estimate, as the microlocation (b, 6) moves away from 
({0} X [— p, p]) X {0}. In the situations where we would use that decay estimate, the next 
lemma is more convenient. 



Lemma 5.1 Suppose that 9 € [0, ^/a\, and set 

T := cos sin 6'(a~^ — a~^), dl := 6^(ct| — a^'^r), 

and 

(f ■= I min± ((±p - 62)0-1 + fTf^ftir)^ : 62 - fif ^6it [-p, p], 
\ : b2-a^XTe[-p,p], 

where 

C7i = (a"2sin2 6l + a-^cos^ 0)^/2 and cr2 = (a"^ sin^ + a^^ cos^ 6^)^/2. 

Then, for N = 1,2,..., 

\{wC,^aAe)\ < CN ■ a~^/^ ■ a^' ■ {diy' ■ (1(^1,^1^2)!)'"'^. 

In some cases, spatial decay alone is insufficient and we also need to exploit directional 
localization; for such cases we employ the following lemma. 

Lemma 5.2 Suppose that 9 G (-v/a, vr). Then, for L, M = 0,1,2, ... , 

\{wC,^a,b,e)\ < CL,M ■ a-^'^ ■ I cos^l • e-''^ • (|6i|)-^ • (a^/^l gin^l + a\ cos9\)^ 
iM)'^'^ ■ (p + a^/^os^l +a|sin0|)^-^. 

Both previous lemmas will be proved in Subsections 9.4.1 and 9.4.2, respectively. To- 
gether, they imply that the curvelet frame coefficients of wC are sparse. Indeed, in the 
directional panels where 9 is close to 0, we have about 2-''/^p significantly nonzero coeffi- 
cients, which are bounded by c2-'/^, while in the directional panels where is far from zero, 
we have few significantly nonzero coefficients. Formally, 

Lemma 5.3 Let aj = ((iojCj, 7^))^ denote the curvelet frame coefficients ofwCj = Fj-kwC. 
For each p > 0, there is Cp > so that, 

||a,||,<c,.2^-«V2+.)/.+i/4)^ 

This will be proved in Section 5.1. The next result, making precise the location of the 
significant coefficients, is proved in Section 5.2. 

Lemma 5.4 Put 

Sj = {{j,k,e) G Af : {bj,k,e,ej,e)eMi'^{aj)}. 

Then 
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Figure 7: Curvelets associated with the cases T1-T4. The hne segment is the support of 
wCj. The dotted hne is the affine extension of that segment. 

5.1 Proof of Lemma 5.3 

We first observe that the fuh curvelet coefficient vector aj is simply the extension of 
((it;£j , 7^))^g^±i to scales away from {j — + 1} by zero filling. Also WLOG we 

can assume that rj £ Aj, since the terms related to the scales j — 1 and j + 1 only change 
the constant factor of the final estimate independent on j. 
Define the following four regions in phase space: 



We now split the norm ||aj||p according to each curvelet coefficient's microlocation. For a 
phase space set Af write rj ~ Af, meaning the set {rj : rj £ Aj and {bj^k/-,(^j/) G A/"}. We 
have the following decomposition: 




MP {a) 



{6 G R2 : d2{h, {0} X [-2p, 2p]) < L»2(a)} x [0, V^, 
({6 G B? : d2{b, {0} x R) < 1)2(0)} x [0, \ 

B? X (^/a, vr). 



(a) 



IIP 



(5.3) 



We have the following approximate equivalences: 



rjr^Mi'^iaj) « {{j,{h,k2),0) : |A;i| < 2^',k2 G [-2p/^,2p/V^]} 

rj^MPiaj) « {{j,{h,k2),0) : \k^\ < 2^' M [-2p/^,2p/^} 

V^K'^iaj) « {{j,k,0) : k E ZM^ij > 2^'}, 

Tj^MPia,) « {ij,k,£):£ = l,...,a-'^^-l}. 
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Thus, in place of the original continuum-domain splitting (5.3), we consider instead the 
'discrete-domain' splitting 

= 12 Yl K«^'^j'7j,(fci,fc2),o>r+ Yl Yl K^'^i'7j,(fci,fc2),o>r 

|fci|<2J= |fc2|<2p/v^ |A:i|<2J= \k2\>2p/y^ 

-1/2 , 
a ■ ' —1 

= Ti + T2 + T3 + T^. (5.4) 

These terms correspond, respectively, to nearly vertical curvelets lying on the line segment 
singularity (Ti), nearly vertical curvelets centered elsewhere on the line containing the line 
segment {T2), nearly vertical curvelets centered elsewhere (T3), and all other curvelets (T4), 
as illustrated in Figure 7. 
To estimate Ti, use (5.2) 

E K^^.,7,,(o,.2),o>r < c • 2^5 • E 1 ^ ^ • «7'^'^'^'> (5-5) 

fc2=-2p/v^j k2=~p/'/a^ 

SO Ti < c • 2-'^aj ^''^ = c2^^^l'^'^p/'^+^) . Here and below, when we write a sum taken over 
integers with non-integer bounds, we implicitly mean that the sum extends over all integers 
between the bounds. 

To derive estimates for T2-T3, we first transfer the estimates for ^ = derived in Lemma 
5.1 (in terms of continuum parameters) to statements about \{wCj,^j^k,i)\-, in terms of the 
discrete lattice parameters. 

Lemma 5.5 For N = 1,2, . . ., 

Ku;£,-,7j,m)I < CN ■ aj'/' ■ {\h\r' ■ {aj'[\a,h\^ + minlafk^ ± p\^]'/y~'' . 

Proof. This follows directly from the 'in particular '-part of Lemma 5.1 and the relation 
between continuous coefficients and lattice parameters given by (1.5). □ 
To estimate T2, let IC2 = {^2 S Z : A:2 > 2/?/y^}. Then 

\k2\>2p/y^. 

K2 

< c;v,p-a-^/'E(l«;'^'^2-a7Vl)^^-^)^. 
For {N — 2)p > 1, we have 

J 2p/v^ . J p/aj 
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This estimate concerns ki = 0. For the other cases with |A:i| < 2^'^ we use this same estimate, 
getting 

To estimate T3 let /C3 := {k G Z^, \ki\ > 2^^} and choose N so that e ■ {N - 2)p > 3^. 
Then 

Partition the set /C3 = /C|] U /C J , where /C|] = {|/c2| < 2^/9} n /C3. The sum over /C3 involves 
sites where /c2 may as well be zero; it is not asymptotically larger than the LHS in this 
display: 




Since e-{N—2)p > 5/4+e, this last term is 0{2 ^/^). The sum over K,\ is not asymptotically 
larger than the LHS of the next display; the RHS uses Lemma 9.3: 

/ / (|(xi,X2)|)('"^)^dxi(iX2 < c(|(2^^2V)|)(2-^)^'+2 < 2J-(l--({^-2)p-2)) 

Since e • (TV - 2)p > 2,\, this last term is 0{2-^l^). We conclude that 

Before estimating T4, we translate Lemma 5.2 into a simple form involving discrete 
curvelet parameters. 

Lemma 5.6 Let 

1/2 1 1/2 

= {\ajkicos6j^i — k2sm.6j^(\)~ ■ {a^ \sm.9j^i\+aj\cos6j^i\) 

1 /2 1 

• (I Oj /ci sin Oj^i + a- k2 cos ^j/ 1 ) ~ • 
There exist constants cm so that, for j, k, N = 1,2, . . ., 

Proof. This follows directly from Lemma 5.2, from {v/u) < {v)/u for < u < 1 and the 
relation (1.5) between continuous coefficients and lattice parameters. □ 
By Lemma 5.6, the term r4 can now be estimated by 

-1/2 -1 

€=1 A;eZ2 
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Let Bj/ be the rotated anisotropic cartesian grid of curvelet coefficient locations at scale 
j and orientation 9j^i. Note that for large, 

af Yl (l^il)'"^ • (l^2|>-^ ^ / / (|6i|>-^ • {\b2\r''dhdb2, J ^ oo. 

Indeed, the function F{b) = (|6i|)~^-(|62|)~^ is smooth and the above display just expresses 
the fact that Riemann sums of F converge to the integral of F. In fact it is quite evident 
that the convergence is uniform in 9. We conclude that 



max > J"'/!^^ - ^' "-j ■ "-j ^^^"^ ■ 



We obtain 



3 1 sin 



m ^ -p/4 -A'p/2 -3/2 -PV — 2^. 

On the interval < oj < 7r/2, sin(a;)/a; > 2/tt. We have | sin(7r^y/aj)| > < £ < 

1/2 

Oj- /2. Hence, using | sin(7r/2 + a;)| = \sm{iT/2 — uj)\, 

— 1/2 

Summing the geometric series X^^^i with z = e~'^'^'^^ , we finally obtain that for all 
with Np sufficiently large: 



-1/2 , 
a. -1 



£=1 fc6Z2 

Summarizing our bounds on T1-T4 and using (5.4), we obtain: 
Lemma 5.7 For j, N = 1,2, . . . , and p > 0, the following holds. 
(i) We have 



J2 \{wCj,j^)\P < CN,p- a~j 



(l/2+£)-p/4 



{r,eA±i:(fej.fe,,,0j.f)eAr|'S(a^,)} 

(ii) We have 

\{wCj,^,)\P<CN,p-af-P^/\ 

Proof. Again reducing to scale j and to the discrete setting as in the proof of Lemma 5.3, 
(i) follows from Ti, i.e., from (5.5). (ii) follows from T2-T4, i.e., from (5.6), (5.7), and (5.8). 

□ 

Finally, this lemma now implies that 

WctjWp < CN,p ■ a"^^^^"^^-*"^''^, 
which is what was claimed in Lemma 5.3. □ 
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5.2 Proof of Lemma 5.4 

Using the special properties of our subband filters, WLOG we can assume that G Aj, and 
can conclude that 

||ajl(5^,)c||i < c- \{wCj,-f^)\P. 
Applying Lemma 5.7(ii), we obtain 

< CAr,i • o° = 0(1), j oo. 

□ 

6 Sparse Expansion of a Curvilinear Singularity 

Continuing our 'infrastructure development', we now study properties of curvelet coefficients 
of a curved singularity. The strategy is to smoothly partition the curve into pieces and then 
straighten each piece, enabling us to apply results from the previous section. 

6.1 Tubular Neighborhood 

First, we develop a quantitative 'tubular neighborhood theorem'. By regularity, we note 
that the radius of curvature of r is bounded below, by r > say. We can find p small 
compared to r and an integer m so that 

m ■ p = length(T) 

and so that the integrated curvature of r on each interval \{i — 1)/), (i + l)p] is controlled: 

/■{i+i)p „ 

/ \r {t)\dt < e. (6.1) 
Jii-i)p 

Consider the following local coordinate system in the vicinity of r. Let tj = ip, for i = 
0, . . . , m, and T* = [U^i, tj+i] for i E 1, . . . , m — 1. If r is a closed curve, let = [tm~i,ti] 
and = (as T(to) = T{tm))- Let rii be some choice of unit normal vector to T(tj). 
For y £ a point near T(t), consider the closest point in image{T); this has arclength 
parameter 

X2{y) = argmin{|T(t) — y\ : < t < length(T)} 
and signed distance parameter 

^liy) = {n'i,y - T{x2{y))) ■ min{|r(i) - y| : < t < length(T)}. 

Define the correspondences 

4>^{y) = (.x\{y),X2{y) - ti), i = 1, . . . ,m - 1, 

with similar definitions, slightly amended for the case i G {0, m} if r is a closed curve. 
Recall the curvature bound e in (6.1). 
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image{T) 




Figure 8: The tubular neighborhood 1^/ = UiY^, of image{T) and the mapping : Y^, i— ?■ 

Lemma 6.1 (Tubular Neighborhood Theorem) For sufficiently small e > 0, there is 
some e' > so that, for X^i = [— e',e'] x [— />, p], we have: 

• the correspondence cj)^ is one-one on the set Y^, = (0*)^^ [X^/], 

• the mapping (f)^ : Y^, i— )• X^/ is a diffeomorphism, and 

• the mapping cj)^ extends to a diffeomorphism from to JH? which reduces to the 
identity outside a compact set. 

In what follows, always denotes the extended diffeomorphism from to R^. 

The set Y^/ = DiY^, is a tubular neighborhood of image{T) on which we have nice 
local coordinate systems, see Figure 8. This will allow us to locally bend the curve r into 
something straight. 

6.2 Cutting into pieces 

Choose a C°° function t(;2 : R i— ^ [0, 1] (cf. Section 5) supported in [—1, 1] so that 
W2{t/p)+W2{{t-l)/p) = l, -l/2<t<0, 

and 

W2{t/p) +W2{{t + l)/p) = 1, 0<t<l/2. 

In addition, we require W2 to satisfy 

|W2(^^)| < c- e^l'^l, UGH. (6.2) 
Define now a smooth partition of unity of [0, 1] using W2- 

W2,i{t/p) = W2{{t - ti)/p), 1 < i < m - 1, 
with a modification for i E {0, m} that depends on whether r is closed or not. Then 

X1^2,*(i/p) = 1 VtG[0,l]. (6.3) 
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This will allow us to chop the curve r into something that can be bent. 
Now define the distributions 

rU+i 

C = I W2,i{t/p)6r{t)dt; 

the partition of unity property (6.3) gives Xli^* — ^■ 

We note that {T(t) : t G T*} C Y^,, and hence (j)'^ diffeomorphically straightens the piece 
of curve {T{t) : t G T*} into the line segment {0} x [—p, p]. 



6.3 Bending one piece 

Now consider a diffeomorphism (p : i— > R^; it acts on the distribution / by change of 
variables 

rf = f° ct>. 

This action induces a linear transformation on the space of curvelet coefficients. With a{f) 
the curvelet coefficients of / and /3(/) the curvelet coefficients of (p*f, we obtain a linear 
operator 

M^(a(/))=/3(/). 

It is by now well-known that diffeomorphisms preserve sparsity of frame coefficients when 
the frame is based on parabolic scaling (as with curvelets and shear lets). For example, the 
following can be derived from Hart Smith's work [38] by a simple atomic decomposition. 

Lemma 6.2 [8, Theorem 6.1, page 219] For p > 0, define the operator quasi-norm 

\WU\\op,p = max <^ sup||((7^,(/>*7^/))^||p,sup||((7^,(/)*7^/)),,/||p \ 
[ n v' ) 

Let (j) denote a diffeomorphism that reduces to the identity outside of a compact set. Then 
for < p < 1, 

\\M^\\op,p < Cp < oo. 

Far more detailed and precise results on the invariance of curvelet coefficients under 
changes of variables, with optimal regularity conditions, were developed by Candes and 
Demanet [5] - as we will see in the next section. 

The p-triangle inequality \a + b\P < \a\P + for p G (0, 1] implies the following: 

Lemma 6.3 For p £ (0, 1], a vector a = {cer^)ri, and a linear operator M, 

||Ma||p < ||M||op,p||a||p. 
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6.4 Gluing pieces together 

Now define 

From the decomposition Cj = ^ Cj we have 

m 

1=1 

This decomposition allows us to relate sparsity of coefficients of the linear singularity to 
those of the curvilinear singularity: 

\\(3j\\p < m^/P • ^max ||M^>||op,p^ • llajllp. 

This decomposition will be useful below; however, the above argument, which implies spar- 
sity, will not be enough for our main result, which requires also to know the geometric 
arrangement of the significant coefficients. The next section develops a much finer estima- 
tion approach. 

7 The cluster S2J and its estimates 

We finally turn to the definition of the cluster set S2J and the decisive estimates 

S2,j = oiWfjh), i^oo, (7.1) 

and 

fJ'c{S2,j, {7r?}; {ipx}) ^0, j 00. (7.2) 

As explained in Section 2.2, combining these results with the results of Section 4 will 
complete the proof of Theorem 1.1. 

We define the cluster of curvelet coefficients indirectly. We first define Sj, the cluster 
of significant coefficients of our 'straight' model singularity wCj; then by cutting, bending, 
and filtering, we induce a cluster for the curvilinear singularity Cj. Set 

= {{j, k, i) G Af : (bj^k/, 0j/) e AAf ^(a,)}. (7.3) 

Lemma 5.4 shows that this set contains the significant coefficients of wCj. 

Let = {{-jrfjFj * Jn'))r},'r]' be the filtering matrix associated with the filter Fj, and 
recall the definition of the mapping matrix M^^i^-i from Subsection 6.3. Our analysis will 
require us to consider their product, hence for the sake of brevity we define Mj to be 

Mj = Mf^ • M(^,)-i 

and the entries of this matrix by Mj{r],r]'). Further, we let t^^i^n denote the amplitude of 
the n'th largest element of the ry"th column. Also let nj = 2-'^, where e was fixed at the 
beginning of Section 4. We can think of e being arbitrarily small, however for our analysis 
the condition e < 1/28 will be sufficient. 
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Morally, what we would like to do is study a cluster of curvelet coefficients built from 
the cluster pieces 

5j = {r/ : r/' G Sj and |Mj(r?,r/')| > V,„,}. 

In words, Sj consists of the 'top-nj' curvelet coefficients affected by some significant coeffi- 
cient in Sj. The overall cluster set would then be made by combing the pieces: 

i 

While this morally explains what we do in this section, it turns out that the exact 
behavior of Sj and S2J defined in this natural manner would be rather delicate. In fact, 
this section uses a more robust definition of cluster set that is similar in spirit; see (7.6)-(7.7) 
below. This definition depends on some more sophisticated ideas, which we now develop. 

7.1 Decay Estimates for the Curvelet Representation of FIO's 

We first recall some results from [5] on sparsity of curvelet representations of Fourier Integral 
Operators (FIO's) and decay estimates of such a representation, which will later on be 
applied to the matrix Mj. 

In order to state decay estimates of the curvelet representation of FIO's, we first require a 
notion of distance between two curvelet indices. A suitable distance has first been introduced 
by Hart Smith in [38]. Our analysis will employ results obtained by Candes and Demanet 
in their work on the curvelet representation of wave propagators [5], in which they use the 
following variation of Hart Smith's distance: 

dnsiViV') = \Gj,e - + - h'? + Ker,,ftfc - 

where 

bk = Re^iDa^k and = {cos{ej/),sm{ej/)), 

and the difference \6j/ — 6j'/'\ is understood to refer to geodesic distance in P^. In [5], 
this distance was then extended to derive a distance adapted to discrete curvelet indices, 
which means, in particular, including the scaling component. For a pair of curvelet indices 
T] = {j,k,i) and ij' = {j',k',i), this so-called dyadic-parabolic pseudo-distance is defined by 

^(r?,77') = 2l^-^'l (^l + mm{2^,2^'}dHs{r],r]')) . 

We will require the following property of this pseudo-distance: 

Lemma 7.1 [5, Prop. 2.2 (3.)] For sufficiently large N > 0, there is a constant cn > 
such that 

v" 

Another property which will come in handy is the following estimate: 
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Lemma 7.2 [5, Proof of Thru. 1.1] There exist some N > and constant cjv > obeying 

Before we can state the next result we have to briefly recah some of the key notions 
in microlocal analysis. Let S'*(R^) denote the cosphere bundle of - roughly speaking 
{(^O)^o) ■ bo £ R^, Oq £ P^} -, and let (/> be a diffeomorphism of R^. Then the associ- 
ated so-called canonical transformation x niaps some element (5o5^o) of phase space into 
x(&0)^o) = ('^(^o)) 0*^o)) where (P*Oq is the codirection into which the codirection 6q based 
infinitesimally at bo is mapped under (p. Phrasing it diff'erently, we can say that each diffeo- 
morphism of the base space R^ induces a diffeomorphism of phase space. Such a canonical 
transformation induces a mapping of curvelet indices which - abusing notation - we again 
denote by x- Since we will consider discrete curvelet coefficients rj, we have to be careful 
how to define this extension. In fact, we will define the image of rj to be the closest point 
using the pseudo-distance oj to the image of rj under the canonical transformation. As al- 
ready remarked in [5], choosing a different neighbor only affects the constants in the key 
inequalities. 

The basic insights about parabolic scaling and FIO's are already present in [38], im- 
plying sparsity of FIO's of order 0, as explained in [8]. But utilizing the dyadic-parabolic 
pseudo-distance, Candes and Demanet derived phase space decay estimates for the curvelet 
representation of FIO's of each order m, which imply sparsity, but also inform about geom- 
etry. 

Theorem 7.1 [5, Thm. 5.1] Let T be a Fourier Integral Operator of order m acting on 
functions ofR?. Then, for each N > 0, there exists some positive constant cn such that 

K7,,r7,OI<C7v2"^^-'o.(r?,x(r/'))-^- (7.4) 
Moreover, for each < p < oo, ((7^,T7^/)) is hounded from Ip to £p. 

In the sequel we will use the first part of the result for m = 0. Let us now turn to the decay 
estimate of the cluster approximate error 62,j. 

7.2 S2J offers low approximation error 

In this section we give two decisive lemmas which drive our analysis, and define S2j. From 
now on X* denotes the extension to curvelet indices of the canonical transformation associ- 
ated with ((/)*)~^. 

Lemma 7.3 For any N > 0, there exists a positive constant cm such that 

\M;{rj,r,')\<CN-uji^,x\v')r^, Vr?,r?'. (7.5) 

Proof. Given > 0, by Theorem 7.1, there exists some positive constant cat such that 
(7.4) holds, which implies both 

\{j„F,^j,,)\<CN-u;irj,rj'r^^+'^ and o < cn ■ u;iv,x\v')r^''^'^ ■ 

Now applying Lemma 7.1 proves the claim. □ 
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Lemma 7.4 There is a constant ci > such that for each vector a = {arj)r] , 

||Mja||i < ci • ||a||i. 

Proof. This ah'eady follows from Lemma 6.3. However, it is instructive to reprove it using 
(7.5) of Lemma 7.3: 

||Mja||i < sup^ I Mj(r/,r/') I • ||q;||i < cat • sup ^ • ll«lli- 

Realizing that the sup^/ allows us to omit and applying Lemma 7.2, 

sup^a;(r/,x*(??'))"^ = sup ^ tj(r?, r?')"^ < cat. □ 

These two lemmas say that, in place of studying Mj and its detailed properties, we can 
simply study its majorant cn ■ oj{r},x^{'n'))~^ • So fix N large and let Mj denote the 'model' 

Mj(7?,7?') = c^-a;(7?,x*(r?'))"^. (7.6) 
We define our cluster set 52 j- in terms of the model rather than in terms of Mj, via 

S2,=\JS], (7.7) 

i 

where 

S] = {77 : 77' G Sj and |M](r/,r?')| > V.nJ- 

In this definition, 5j is not truly the set of significant coefficients, but rather a set of sites 
where significant coefficients could potentially occur, given the geometry of the problem; so 
it is a bit larger. We still speak of S2^j as if it were exactly the set of significant coefficients. 

The set Sj is explicitly defined by a tube in phase space; the tube becomes narrower at 
finer scales and 'converges' to W F{wC). The set of potentially significant coefficients 5j is 
a much thicker set and gets progressively thicker relative to Sj with increasing j, however, 
geometrically the corresponding 'tube' is still becoming very narrow as j increases. This 
device already appeared in the Heuristics section; it allows to conveniently bound all the 
insignificant interactions Mj{r],r]'); in particular, see the estimate of T2 in the proof of 
Lemma 7.5. 

We can now prove the estimate (7.1) for the cluster approximate error 52j- 
Lemma 7.5 

S2,j = 0{\\fj\\2), j 00. 

Proof. As in Section 6.4, let /3j = {{^r],Cj))r, as weU as /3j = ((T/y,^]))??, i G {0, . . . , m — 1}. 
The decomposition Cj = J2i ^j implies /3j = J2i f^j ■ Now 



< m • max \^ |/3'-(r/)| < m • max ||/3* • 1 . ±u | 

i ^ — ' ■' i ■' i \ 3 
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We now decompose ||/3j • l/^±i\5i||i into three components and estimate each separately. 

Let Uj := [e • j] which will be the 'radius' of the scale-neighborhood about scale j that 
we distinguish from the remaining (unimportant) scales. In what follows, remember the 
definition of Sj in (7.3); and let S^^^ = \^j,Sj' . Then 

< \\Mj{a ■ 1 -±u,) • lA±i\5dli + \\Mj{a • l-±u,) • lA±i\5dli 

Wr^-^" Vv-sjiii 

= Ti+Ts+Ts. (7.8) 
First, let's estimate Ti. From Lemma 7.4, we obtain 

j+Uj j+Uj 

T,< ||Mj(a.l^^.,^J||i<ci. II«-1a,A^/"^- 

j'=j-Uj j'=j-Uj 

By Lemma 5.4, 
hence, 

Ti < ci • {2uj + 1) = 0{2^^), j oo. (7.9) 
Next, turn to T2. Observe that 

j+Uj 

T2< E [ f^F H |M;(7?,??')|] -lla-^JIi- (7-10) 

We now need the following standard lemma about n-term approximations. 

Lemma 7.6 Letx = {xi)i denote a sequence of numbers and let denote the nth-largest 
element in the decreasing rearrangement. For < p < 1 we have the inequality: 

Yj ■ ^{\^^\<\A(n)} - ■ ll^llp ■ rr'^^"^'^/^, n = 1, 2, . . . . 

i 

Recalling that 5^ consists of elements ry such that |M?(t/, 77')! > t^i^^p we conclude 

-(i-p)/p ^ ^-(i-p)/p 



sup < Cp- (sup5^|Mj(r?,r?')r J 



1/p 



Also, from Lemma 5.3, we obtain \\a ■ Ic, ||i < ||a • 1a Ji < ci2J'(3/4+^). Choose p so that 
p < 4e/(3 + 7e); returning to (7.10), 



T2 < c-n-('-P)/P 2^^'(3/4+^) < c-2-(i-f)^^VP.(2n^. + i).23(i+«i)/4+i^ = 0(2^^^), j ^ 00, 

j'=J-%- 

(7.11) 
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At last, we consider T?. Notice that 



T3< [ f^P E |Mj(7?,7?')|] -lla-lA,.,!!!. (7.12) 



Using the definition of Mj, we proceed as in the proof of Lemma 7.4, and employ the 
definition of the pseudo-distance ui; we'll obtain 

sup |Mj(7/,7?')l < CAT- sup Y ^(^'^')"^ 

By Lemma 5.3, the second term in (7.12) can be estimated by \\a ■ 1Aj/||i ^ c • 2^'^^^^^^\ 
We conclude that for N sufficiently large, 

T3<CN Yl 2^^'(3/4+^)-^l^-^^'l = 0{2^'), j ^ oo. (7.13) 

li-i'l>%- 

Combining (7.9), (7.11), and (7.13) with (7.8) yields 

^^d= E l/3i(^)l = 0(2^--) = o(||/,||2), j^oo. □ 

7.3 S2J offers low cluster coherence 

This section proves (7.2), the asymptotically negligible cluster coherence of S2j- 
Lemma 7.7 We have 

^£(52^, {7ry}; {V'a}) -^0, j ^ 00. 

Before giving the proof, we state two useful lemmas. Both use the variables rij introduced 
earlier. The first lemma implies that a given significant curvelet coefficient in the analysis of 
wCj pushes forward to produce significant coefficients at roughly the same scale, and near 
a certain fixed orientation and location. Thus the pushforward acts roughly like a rigid 
motion. That first lemma is proved in Section 9.5.1. 

Definition 7.1 Let the canonical transformation x given. For a specific curvelet index 
r] the forward set of radius n is: 

FWD(r?;x,n) = {r]' : uj{r]',x{v)) < n}. 

In words, FWD is the set of curvelet indices close to the pushforward of r/ by x- Note that 
the forward set covers the set of significant interactions with r]: 

W ■■ \Mj{v',v)\ > t^,nj} C FWB{r];x',nj). 

Consequently 

C U U FWD(7?;x\nj). 

TjGSj ie{0,...,m} 
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Lemma 7.8 Let rj = (j, k,i) be a curvelet index, with its image under the canonical trans- 
formation X* denoted by x^iv) = for a fixed i G {0, ...,m}. There exists some 
positive constant c > such that, for j sufficiently large, 

FWD{7];x\nj) C {{j',k',l') : \j' - j\ < c ■ logn,-, \2(^'-'^)/''i - <c-nj, 



3^ 



2^ • max{|6r, - fe^'P, \h - bk'\} <c- Uj} 



We conclude that for sufficiently large jo, there exists c > so that 

#FWD(r7; x\rij) <c-n'^, j > jo, r/ € Sj. 

Let k{r]) denote the /c-component of = {j,k,£). Define 

dminiVihrij) = min{|A;(r?')| : r?' G FWD{r];x\rij)}. 

We also need the fact that points in the forward set of rj have spatial components almost 
as far from the origin as the spatial component of rj itself. 

Lemma 7.9 There are jo,ci^Q,C2.o so that we have 

;i,nj) > ci(j)|A;(77)| - C2{j), j > jo, 

where 

ci(j) = cifi/uj and C2(j) = C2,o • nj. 
The proof is given in the appendix, as is the proof of 
Lemma 7.10 For N > 2, there are constants C3,C4,C5 such that 

{{a\k\ - 6)+)-^ < {b/af ■ (C3 + C46-^) + C5. 

With the last three lemmas we can now prove the main result of this section. 
Proof of Lemma 7.7. The definition of S2,j implies 

m—l 

y"c(52j,{7T?};{V'A}) = niax ^ |(7^,V'j',fc')| = max ^ ^ |(7^,V'j',fc')| 



< m • max rnax | (7^, ipj',k') \ 



i j'.k 



WLOG assume that j' = j and k' = 0, reducing the task to proving that J^rjeS' K7»?' V'j.o)! — ^ 
as j — 7- 00 for all i. We have the estimate 

^c(52j,{7r?};{V'A}) <m-max V K7r,,^j,o>|- (7.14) 

I ^ — ' 
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We can now continue (7.14) by applying Lemma 3.3 and the three lemmas immediately 
above: 

< 
< 
< 



< 
< 

We have assumed that e < 1/32, ■ 2^^-' — )• as J — )• oo; substituting 

this into (7.14) proves the lemma. □ 

8 Discussion 
8.1 Extensions 

So far we focused entirely on a very special separation problem using very specific tools of 
harmonic analysis. Our goal was to show that a certain set of questions and results make 
sense and provide insight. This is the 'tip of the iceberg': the main results are susceptible 
of very extensive generalizations and extensions. 

• More General Classes of Objects. We may vary the problem, taking point and curve 
singularities whose 'strength' is different than the ones we chose in (1.1)-(1.2); however, 
always matching the strength of the point singularity to that of the curve singularity. 
For example, consider a 'cartoon' image model, where C is a function smooth away 
from discontinuities, and the components of the continuity set are bounded by a 
complex of smooth curves. Such cartoons still exhibit curvilinear singularities, but 
the singularities are of order zero rather than order —1. For a separation problem 
with nontrivial asymptotics, we replace the point singularity \x — Xi\~^/'^ in (1.1) by 
\x — Xi\~^^'^, preserving an energy-matching condition like (1.3), with r^^ replacing 
r^. Recall that, without energy matching, the whole problem is trivial. With such 
changes the proof of Theorem 1.1 will run very closely in parallel. As a general rule, 
if (C, /) = /(A'^/)(r(t))dt, where 7 is a fractional power of the Laplacian A, then 
matching point singularities have strength a = (—3 + 47)/2. The case we studied in 
this paper was 7 = and hence a = —3/2. 

• Other Frame Pairs. Theorem 1.1 holds without change for many other pairs of frames 
and bases. Consider this pair: 



neS, r;'6FWD(T,) 

Y Y c^-2-^-/^-(|A:(r?')|)-^ 

1^eSj r,'6FWD(T,) 

C-2-^-/4. ^#pWD(7?).(d„,„(7?))-^ 

c ■ 2-^V4 . ^4 . ^ ((ci(j)|Mr/)| - C2(i))+)-^ 

c ■ 2-^/4 • • (ll^ • (C3 + C4 • ci(i)2(c2(j))-^) + C5 
j>jo. 
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o Orthonormal Separable Meyer Wavelets - an orthonormal basis of perfectly iso- 
tropic generating elements. 

o Shearlets - a highly directional tight frame with increasingly anisotropic elements 
at fine scales. 

In this pair, the wavelets are actually orthonormal, and both wavelets and shearlets 
correspond very closely to discrete transforms used in digital image processing. In 
digital image processing, the notions of 'radial', 'directional', 'rotation' and so on are 
problematic; both orthonormal wavelets and shearlets avoid such concepts. At the 
same time this pair offers the same ability to sparsify point and curve singularities 
as the counterparts pair we introduced above. This allows to provide a complete 
methodology for the continuous and discrete setting (see, e.g., [28, 31]) as well as for 
algorithmic realizations (see, e.g., [33, 32]). 

While the proof arguments explicitly cover the one frame pair we have taken pains to 
define so far, those arguments extend immediately to other 'compatible' pairs - where 
the cross-frame matrices are almost diagonal in a suitable sense. This grants us the 
freedom to prove results in one system which is convenient, but apply those to another 
compatible system. The arguments showing that shearlets and curvelets are compati- 
ble are supplied in [21]. In this paper we discussed the pair radial wavelets/curvelets. 
However, all results hold true in a similar way for the pair orthonormal wavelets/shear- 
lets. 

Noisy Data. Are the results studied here robust against small modelling errors? In 
fact they are. Consider an image composed of V and C with additive noise M, hence 
we measure f = V + C + J\f instead of /. We then - as in the noiseless case - filter 
to obtain subband components fj = Vj + Cj + Mj and apply (CSep) to fj to obtain a 
pair (Wj,Cj). Provided that the noise component M has 'sufficiently' small curvelet 
coefficients in the sense that at each scale j the ii norm of the analysis coefficients 
satisfies 0(2^/2) 

as J — 7- oo, we again obtain asymptotically perfect separation: 

> 0, J— )-oo 



ll^jl|2 + ||C,||2 

This can be proved along the lines of the proof of Theorem 1.1. Indeed, consider a 
composed signal S = Si + S2 + ri with components Si and ^2 relatively sparse as in 
Proposition 2.1, and noise term n satisfying ||^>^n||i < e or ||<I>^n||i < e. Let {Si,S2) 
solve (2.1) with S substituted by S. Then following the proof of Proposition 2.1 line 
by line and adapting the arguments accordingly shows: 

ll'^i - 5'i II2 + ||5'2 - 'S'2112 < —■ (8.1) 

1 — 2k 

Substituting Proposition 2.1 by estimate (8.1) in the proof of Theorem 1.1 implies the 
result on geometric separation of noisy data stated in Subsection 8.1. 

We conclude that our analysis is indeed stable. 
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Rate of Convergence. One might wonder about the rate of separation. The lemmas 
proven in Sections 4-7 imply the following upper bound on the rate of convergence for 
ii minimization: 

\\yj\\2 + \\tj\\2 

Such information might be the key to getting even stronger separation conclusions. 

Other Algorithms and Other Notions of Separation. In the companion paper [20] 
we show that one pass of alternating hard thresholding, properly tuned, can achieve 
asymptotic separation. Surprisingly, we can even show clean separation at the level 
of wavefront sets. 



8.2 Interpretation as an Uncertainty Principle 

Separation results such as the 'birth problem' of ii component separation, a combination 
of sinusoids and spikes [18], have been interpreted at that time as uncertainty principles. 
As a reminder to the reader, the classical uncertainty principle states that a signal cannot 
be highly concentrated in both time and frequency; and a lower bound is placed on the 
product of the concentration in time and in frequency. The core property which allows the 
separation of sinusoids and spikes by using a dictionary consisting of the unit basis and the 
Fourier basis, is the non-existence of a sparse representation of a signal both in time and in 
frequency. 

Considering the present separation problem, these core ideas need to be extended, 
thereby providing us with yet another interpretation than the one already presented in 
the previous sections. The two representation 'domains' are now the isotropic system of 
wavelets and the anisotropic system of curvelets. Hence, we might regard the separation 
result of Theorem 1.1 as a statement that a 2D Schwartz distribution cannot be sparsely rep- 
resented via analysis coefficients both in the 'isotropic world' and in the 'anisotropic world'. 
In particular, if a 2D Schwartz distribution has a sparse representation in wavelets, it is not 
sparse in curvelets and vice versa. Phrasing it in more general terms, a 2D Schwartz distri- 
bution having only isotropic features cannot be sparsely represented using an anisotropic 
system, and if it has only exhibits anisotropic phenomena, it does not possess a sparse 
representation in terms of an isotropic system. 

Summarizing, comparison with the classical uncertainty principle shows that we here 
derive an uncertainty principle for the isotropy-anisotropy relation instead of the classical 
time- frequency relation. 
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9 Proofs 

9.1 Proofs of Results from Section 2 
9.1.1 Proof of Proposition 2.1 
Proof. Since $1 and ^2 are tight frames, 

Now invoke exact decomposition: Si + S2 = S = Si + S2- Rewrite the last display: 

11^^ - ^?ll2 + 11^2^ - s% < \\<^I{st - s'i)h + W^Usi - 

By definition of k, 
\\^J{St-S'i)h + \\^l{St-S'i)h 

= - s'i)h + \\ls,<^^{st - sf)h + \\lsi'^^{st - sf)h + \\ls|<^^{s'2 - s'2)\\i 

< K . {w^list - sl)h + w^list - + Wis^yAst - sl)h + \\is^y2{s*2 - sl)\W, 

use relative sparsity of the subsignals S^^ i = 1, 2, 
\\^JiSt-S'i)h + \\<i>^{St-S'i)h 

< ^i\\isc^J{St - + \\ls^^^{S*2 - 

+ \\lsi'^JS% + \\lsi'^lS*2\\i + Usi'^ISlWi) 

< Y3^(l|i5rfr5?l|i + l|i5|'J>^52lli + '^)- (9.1) 

Apply minimality of S* and Sg, 

\\lsi<^1Sl\\i + + Usi^lS*2\\i + Us^^^S^i = \\<^>fSt\\i + \\^lS*2\\i 

< \\^Js% + \\<^ls%. 

Again use sparsity of the subsignals 5"?, i = 1, 2, 
\\lSf^JS*i\\l + Us^'^lS*2\\i 

< \\'fjs% + \\<^^s% - iii^i^r^^iii - \\is,<^is'2\\i 

< \\^JS% + \\<!>^S% + \\ls,^J{St - 5?)||i - ||l5,«&r5?||i 

+us,<^^is'2-s'2)\\l-\\ls,<^^s% 

Using (9.1), this leads to 

\\^I{st-s'i)\U + \\<^>^ist-s'i)h 

< [Us^^JiSt - Sf)\\i + \\ls,^l{St - Sf)\\i + 26] 

i — K 
1 — K 



< 



1 


1 




K 




1 




1 




K 




1 




1 




K 
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Thus, finally we obtain 

\\Sl - + 11^2^ - S% <(l- ^'■r^ = -r^.n 

\ 1 — K J 1 — K 1 — 2k 

9.1.2 Proof of Lemma 2.1 

Proof. For each /, we choose coefficient sequences ai and a2 such that / = $iai = #2«2 
and ||aj||i < for all Pi satisfying / = i = 1,2. Then, employing the fact that, 

because $i and <I>2 are tight frames, also / = ^i^f^iUi, i = 1,2, we obtain 



\ls^^Jf\\i + \\ls,^^f\\i 

= ||l5i^f^*2a2||l + IllSa^I'^iailli 




i,i,<t>2,j)\\a2,j\ + X] ^\{(t'i,i^(t>2,j)\\ai,^. 



jeS2 




= Yj\Y1 \i(t'hi^(t>2,j)\ \a2,j\ + Yj\Y1 K'^l:i''^2J 

< ^c(5i,«>i;«>2)||a2||i + /^c(52,$2;^i)||ai||i 

< max{/ic(cSi,$i;«>2),^c(52,$2;^i)}(||ai||i + ||a2||i) 

< max{nc{Si,<l>i;<l>2),fJ.c{S2,<^2;'^i)}{\\'^J'^iai\\i + ||«>i^$2a2||i) 

,{^lc{Sl,<^l■,1>2),^iciS2,<^2■,^l)}m^f\\l + \\<^^f\\l)■ 



= maxj 



9.2 Proofs of Results from Section 3 
9.2.1 Proof of Lemma 3.3 

Using Parseval, (7a,fe,e, V'ao.bo) = 2vr / %,b,e{0'4^ao,boiOdC, we consider 



□ 



Now WLOG we may consider the special case = 0, so that Rq = I. Recall that W is 
supported on [1/2,2] by construction. Then 

W{aor)W{ar) = Vr > 0, | log2(a/ao)| > 3. 

Hence we need only consider the case where |log2(a/ao)| < 3, and in that circumstance 
we may WLOG take a = oq. We may also assume bo = 0. Apply the change of variables 
C = Da^ and dC = a^/'^di. 
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where Ca = (Cii V^C2) and a;(Ca) denotes the angular component of the polar coordinates of 
Ca- Applying integration by parts, for any A; = 1, 2, 



\{la,b,e,4'aa,bo)\ = 2t: ■ 0^' ■ \Di j J)[ 



dc. 



Hence 



{l + \Dy^b\'')-\{-fa,b,e,4'ao,bo)\ 



IW^'dlCalDl \V{oo{Ca)/V^)\ + A'^[W\\\Ca\\)V{u{Ca)/V^)] 



dC. (9.2) 



Next we show that, for each k, there exists Cfc < oo such that 

IW^'dlCalDl \V{u{C,)/V^)\ + A'[W^{\\a\)V{iv{Ca)/V^)] dC < Ck, V a > 0. (9.3) 



We have 



and 



,V^C2)\\){Ci) 



^-w\\\a) = v^-^w\\\iCi,v^-)\\){C2). 



Hence, by induction, the absolute values of the derivatives of VF^(||Ca||) are upper bounded 
independently of a. Also, 

oc,i 



and 



^V{uj{Ca)/V^) = J-V^(^((Cl, •)a)/^/^)(Cl) •ff2(C,«), 



and tedious computations show that both \gi\, \g2\ possess an upper bound independently 
of a. Thus, by induction, the absolute values of the derivatives of V{uj{(a)/\/o,) are upper 
bounded independently of a. These observations imply (9.3). 
Further, for each k = 1,2, .., 



\Di/ab\)' = (1 + \D,/^b\')^ < ^(1 + I 



(9.4) 



To finish, simply combine (9.2), (9.3), and (9.4), and recall that we chose coordinates 
so that 9 = 0. Translating back to the case of general 6 gives the full conclusion. □ 
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9.3 Proofs of Results from Section 4 
9.3.1 Proof of Lemma 4.1 
Proof. By Parseval, 

where of course r = |^|. Now 

W{aj>r)W{ajr) = Vr > 0, \j - f\ > 1, 

hence we may as well assume that j' = j. Making the change of variables Q = aj£^, dC, = CL^jd^ 
and defining the annulus ^ = {C : 1/2 < |C| ^ 2}, 

JA 

Applying integration by parts, for any A; = 0, 1, 



\{^a„b,Vj)\ = 2^-a//'-|6/aj 



A 



dC. 



< 2^.aTi/^.|6/a,|-'- / A'[W\mC\-'^'] 

JA 

Hence 

(l + |6/a,f )-|(V'a,,b,^j)| <27r-aT 
For W suitably chosen, 



-1/2 



A 



dC. (9.5) 



A 



Further, for each k = 1,2, .., 



+ 



dC < oo. 



(|6/a,|)^ = (l + |6/a,f)t<^(l + |6/a,f). 
Infusing these two last observations into (9.5), for any = 1, 2, 



\{'>Pa,,,b,Vj)\ < CN ■ a- • l|j_j'|<i • (|6/aj|) 



□ 
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Seg{a,b,9) 



Figure 9: Relation between a curvelet ^a,b,9, a line segment Seg{a,b,9), its affine hull 
Line{a, b, 9), and the points Pl and P5. 

9.4 Proofs of Results from Section 5 
9.4.1 Proof of Lemma 5.1 

We study the situation geometrically, and for each (a, b, 6) define the line segment 



Two special points associated to these line segments will play an essential role in our esti- 
mate; they are defined in 

Lemma 9.1 Retain the definitions for di,d2,cri,o'2, and r from the statement of Lemma 
5.1. Then, for each a, b, 9, the following conditions are fulfilled. 

(i) Consider the line 



Seg{a,b,e) = {D,/,R_g ( j : |y| < p] 



Line{a, 



The closest point Pl to the origin on Line{a, b, 9) satisfies 



PL\\l = bi{4-a^\)=dl 



(ii) Let Ps be the closest point on Seg{a,b,9) to the origin. Then 



\\Ps-PL\\l = dl 



Figure 9 shows a general configuration featuring Pl and Ps- 



Proof. Set 
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Then 

\\L{y)\\l = II («^^ (-fei cos e+{y-b2) sin 6) , a-^/^ ^ + (y _ 5^) cos 0) ||i 

= 6?(T| + (y-?)2)V? + 26i(y-62)r. (9.6) 

Since 

A.\\L{y)\\l = 2{y-h)al + 2hT, 
it follows by definition of Pl that 

Pl = L{b2 - a^Xr). (9.7) 

Hence, by (9.6), 

4 = \\Pl\\1 = hlal + {-^i^rfal + 2hi[-afbiT)T = hl{al - afr). 
This proves (i). 

To prove (ii), observe that, by (9.7), Pl G Seg{a,b,9) if and only if 62 — o'^'^biT G 
[—p,p\, which are the two different cases the definition of is separated into. Now, if 
Pl G Seg{a,b,9), then, obviously, = 0. Next assume that Pl Seg{a,b,9). Then 



1 = min||L(±p)-Pi||2 

= min (±p - 62)^0-? + cjf ^6?rV? + 2(p - b2)a^'^biTa'l 

2 



min ((lb/) - 62)0-1 - ^bir) 



□ 

Now define the ray integral 



RNixo,yo) = I {\ixo,t)\)-^dt, 



yo 



i.e., we integrate along the vertical ray TZ{xq, yo) whose 'lowest' point is (xq, yo)- The geom- 
etry of the previous lemma allows to control the curvelet coefficient of a linear singularity 
by a ray integral, properly deployed. Farther below we will prove: 



Lemma 9.2 Let 

CP 



{wC,-fa,b,e) = / w{x2/ p)la,b,e{^-,x2)dx 
J-p 

where ||u;||oo < 1. Then 

\{wC,-fa,b,e)\ < a^'^^^ ■ o-f^ • RN{di,aid2). 

The next lemma gives a bound on the ray integral which, combined with the last lemma, 
finishes the proof of Lemma 5.1. 

Lemma 9.3 For yo > 0, 

i?iv(xo,yo) <vr-(|xor^-(|(xo,yo)|)'~'^. (9.8) 
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Proof of Lemma 9.3. For P G (0, 1) 



\fmt<{ sup \f{t)f)- / \f{t)\^-^dt. 

te{o,oo) Jo 
Now setting (1 - (3)N = 2 and f{t) = {\{xo,yo + , we have 

RNixo,yo)<i sup (bl)'"^)- / \{\{xo,yo + t)\)-^dt. 

Since 

^oo poo poo 

\{\{xo,y)\)~''dy = (|xo|)-*^ • / {y/{xo)r^'dy = {\xo\)-''^' ■ / {ty^'dt, 

J — oo J —oo 

setting M = 2 and recalling vr = f^^i^ + t'^)~^dt, it follows that 

KI(xo,yo + i)|)"'di <vr-(|xo|)-^ 



Meanwhile, since yo > 0, 



sup (|T;|)2"^ = (|(xo,yo)|>'^'^ 

t)G7?.(xo,j/o) 



This proves (9.8). □ 

Proof of Lemma 9.2. By Lemma 3.2, Lemma 9.1, and using the fact that Hi^Hoo < 1, 

\{w^,la,b,e)\ = I W2iy/pha,b,ei0,y)dy 
J-p 

< / ha,b,9{0,y)\dy 
J~p 

< [ a"3/4(|x;|)-^d« 
Jseg(a,b,e) 

where we used an affine transformation of variables to turn the anisotropic norm |(0,y)|a^e 
into the Euclidean norm \v\; the same transformation turns {0} x [—p,p] into Seg{a,b,9). 
In the final expression, the integral is along a non-unit-speed curve traversing Seg{a,b,9), 
at speed cJi. Now let Ray{a,b,6) denote the ray starting from Ps and initially traversing 
Seg{a,b,6). We continue with 

CN-a-^/^- [ {\v\)-^dv < CN-a-^/"^- [ {\v\)~^dv 

J Seg{a,b,e) J Ray {a, b, 9) 

= CN-a-^/'^-a^^ [ {\w\y^dw (9.9) 

J (Ji-Ray{a,b,9) 
poo 

-3/4.^-1 / (\(d. m-^, 



J (Jid2 

= CN ■ a~^^^ ■ a^^ ■ RN{di,aid2). 

In (9.9), the integral involves a unit-speed curve traversing Ray{a,b,9), which explains the 
appearance of the speed factor cJi . □ 
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9.4.2 Proof of Lemma 5.2 

By definition of tlie line singularity wC and by (5.1), we can rewrite {wC,ja,b,e) in the 
following way: 

27r{wC,ja,b,e) = J (w * %,bMi^O)d^i- (9-10) 

Since 



it follows that 

By (9.10), this implies 



27T{wC,ja,b,e) = I e'''^' I p ■ W2{-pm)laflfi{ii,m)e''''''dm 



ib2V2, 



Repeatedly applying integration by parts, and incorporating analyst's brackets (| • |) as in 
the proof of Lemma 4.1, we obtain 

2n\{wCna,b,e)\ < (l&ll)"^ • (1^21)-''' • l|/iL,A/||Li(R), (9.11) 

where 

hL,Mi^i) = p- 1 D'^'^'{M-PV2)%MCi,m)e'''''')dm 
and for some 'nice' / G L^(R^), 

o^'"/(m,,.) = (^)'(4)"/ta.,.). 

Next, we will estimate the term |/iL,M(Ci)l from (9.11), and prove that 
|^l,m(6)I < c-a^^^-e-P^-^ ■ (a^/^i sme\+a\ cos9\)^ ■ {p + a^^^\ cos0| +a| sin0|)*-^ (9.12) 

Let Ha^g(^i) denote the support of the function i— )• D^''^\w{pr]2)ja,o,e{^i:'n2)e^^^^^)- Then 
hL,M can be written as 

/iL,A/(6) = /o- / D'^'^'(w2{-pV2)%,oA^i,V2)e''''^Adri2. (9.13) 

We next rewrite the integrand as 

Z)i.M(^^2(-pr?2)7a,o,e(ei,??2)e*'^^^ 

E U 4'"^(-W2)(-p)"I?^'^-^--(7a,o,.(6,r/2)e^^^^^). 



m=0 
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This allows us to estimate |/il,m(^i)| using (9.13) and (6.2) by 

M 



m=0 
M 

< 

m=0 
M 



E U P^^' ■ ll^^^(^-)II^M|sin.|/(2a),oo)A^^''''-"^(a,^) 



m=0 



^ c.5:(^)p™+^.e-''^.iV^'*^-(a,^), (9.14) 



where 



Since, by simple decay estimates, 

\Di'%flAvi,m)\ < Cl ■ a^/' • (o| cos^l + a'^^l sin0|)^ 

and 

\D^%fiAm,V2)\ < Cm ■ a^^" ■ (a^/^l cos0| + a\ smO])^, 
the term N^'^^~'^{a, 9) can be estimated by 

Af^'^-'"(a, e) < CL,M ■ a^/^ • («| cos 9] + a^/^l sin 9\f{a^/'^\ cos 0| + a\ sm9\)^. 

Combining this finding with (9.14) proves (9.12). 

Thus, in particular, by the support of the function h^^M, the L^-norm of this function 
can be estimated as 



< c-a-^ • |cos0| -a^/^ -e"^^ • {a^/^\sm9\ + a\cos9\)^ • (/) + a^/^os ^1 + a| sin^l) 
Combining this estimate with (9.11) yields 

\{wC,ja,b,e)\ < CM^L-a-^/^ • Icos^l -e-"^ • • {a}/^\sin9\+ a\cos9\)^ 

iM)'^^ ■ {p + a}''^\cos9\+a\sm9\Y'' , 

as claimed. □ 



9.5 Proofs of Results from Section 7 
9.5.1 Proof of Lemma 7.8 

Proof. Below, various constants will appear, which for simplicity shall all be denoted by c. 
Also we write 6^, 6^', etc. in place of the full notation bj^^ = Rg. ^D2-jk, etc. Throughout 

the proof we associate -q with the triple {j,k,£) and similarly for r/' and {j',k',£'). 
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Suppose first that / > j. Then 

< c • (2^''^ + 2(^^-^'')/¥|2 + 2^^ - + 2^\{e^,,b-, - bk')\) . 
Similarly, if j' < j: 

u;{r^',fj) < c ■ (2^-^' + \2^^'-i)IH-^\^ + 2^' ■ \b-^ - + \{er,' ,b-, - bk>)\) .(9.15) 

Using the model (7.6), we now seek to prove that, for sufficiently large j, 

: \M'j{v',v)\ > CN ■ n-^} = #{?/ : co{v' , {j,~k,i)) < n,} > nj. (9.16) 

We first study the case j' < j. Let uj{i]',fj) denote the RHS of (9.15). Note that, if 
we can prove (9.16) with uj{r]' ,f]) in place of uj{r]' ,f]), this immediately implies the original 
claim of (9.16). We repeatedly use the (trivial) 

Lemma 9.4 For each fixed x E R, the set of k £ Z satisfying \k — x\ < R has cardinality 
>R-1. 

Define Cj j, j = nj/c — 2^ ~^ , where c is the constant in (9.15). The condition Co < nj is 
equivalent to 

||20-'-jO/2^~-/|2 + 2^''.|6^.-6fc,|' + 2^"Kev,^fc-fefc'>l < C^j',]- (9-17) 

We next derive conditions making each of the three terms smaller than Cjj,j/3, thus 
implying (9.17). Fix /. By Lemma 9.4, there are at least C^^^ -./^/S — 1 integer values 
/ G Z obeying 

Now, define x = x{fj,r]') = R^^^ ^bj,. Since Rg^, ^, is an isometry. 
At the same time, 

KV'^fc - bk')\ = \xi - 2~^'k[\. 
Now there are at least ^^^^v j/v^ ~ 1 integers G Z obeying 

\2^'^'i2 - k',\ < C^!'j/V6. (9.19) 
For large j' , there are at least C- ■,-■/?> — 1 integers k'^ G Z obeying hath 

|2^-'ii-A;'i|<2^-'/2.C7^/2^/V6 
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and 

\2^'xi-k[\ < C^ .,j/3. (9.20) 
Every pair k' = (k'l, k'2) € satisfying the conditions (9.19)-(9.20) simultaneously, satisfies 
- hk? < q^.,j/3 and \{e.^, ,h~^ - 6^,)! < ■,■/?.. (9.21) 

Combining the above displays, we have at least {C ■■,-■/?> — 1) • (C^^.^-/\/6 — 1) points 

k' = {k[,k'2) G Z2 satisfying (9.21). Every pair {k'J') satisfying (9.21) and (9.18) satisfies 
(9.17). So focusing just on j' = j, we obtain a large number of pairs {k' , (.') satisfying (9.17): 
> c • j, j ^ c ■ ^ iij such pairs. In particular, for all large j, inequality ui{fj,i]') < nj 
is satisfied by at least rij triples rj' = (j', k' 

For the case j' < j, using (9.15), we can similarly prove that uj < rij is satisfied by at 
least Uj triples {j',k',£'). 

Finally, we observe that uj{f],r]') < rij can only hold if 

1/ - j| < c • lognj, \2^^'~'^^'H-l'\ <c-nj (9.22) 

and 

2^' • max{ I bj^ - bk'lMbj^ - bk'\} < c ■ . (9.23) 
Concluding, each r/' contained in the sets in (9.16) must satisfy both (9.22) and (9.23). □ 

9.5.2 Proof of Lemma 7.9 

For r] £ Sj, we have ^ = and I62I < n while < 2p/2^/'^. WLOG suppose that, for 
the patch i in question, we have that the r/' G S'j of interest has b' = Rg/D^-j' k' for some 9 
obeying |^| < cnj/2^^'^. For such a pair {rj,rj'), note that 

k = D2jRo^b, k' = D^j'Rg^b'; 
it is also convenient to define x' = D^j-j'k'. Then 

k2 - x'2 = 2^/2 ((1 _ cos{e'))b'2 + {b2 - b'2) - sin(^')^'i) • 
Now \b'^\ < C for 1]' € FWD(ry), and, from Lemma 7.8 we infer 

1 1 - cos(^') I < cn] /2^ , 1 6'2 - &2 1 < crij /2^ , | sin(e') I < crij /2^/^ . 
Combining these, 

1^2 — X2\ < 2cn'j , 

so I X2 1 > I A;2 1 — 2cn| , and 

141 = 2^^'-^y^\x'2\ > 2-l^-^^'l/2 . {\k2\ - 2cn]) . 

Now 2l-'~-''l < cuj for rj' G fwd(77), so \k{r])\ < \k2{r])\ + \ ki{r])\ < \k2{r])\ + crij, for t] £ Sj. 
So 

\k{7i')\ >\k'2\> 2-l^-^-'l/2 . (1^21 - 2cn|) , J > JO. 

□ 
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9.5.3 Proof of Lemma 7.10 

Let Qi denote the square of sidelength 2^ + 1 centered at the origin. Within each annulus 
= Qt — Qi-i there are, for £ > 2, fewer than 2^^ points in the annulus and each one has 
£^ norm at least 2^~^. Partition the sum YliZ'^ ~ Z^Qi+a-j+Ash- • Starting at £ = 2 each 
annulus contributes at most 2^^ • ((a2^~^ — to the sum, and the inner square Qi 

contributes at most 9. Then 

oo 

Y^{{a\k\-h)^)-^ < 9+ j;22^((a2^-2-6)+)-^ 

ifceZ2 m=2 

/ oo \ 

\-N I 



9 + 16 • 2^"^ . ((02"^ - 6)_ 



\m=0 

Let mo satisfy 2b > aT^° > b. Now 

mo mo 



Ti := 2^'^{\aT^ - b\+)-^ < ^ 22"* < 2^"^°+^. 

m=0 m=0 

Now 22'"o < (26/a)2, so 2"*° < 26/a. So Ti < 2 • {2b /af. 

On the other hand, a2™ - 6 > for m > mo, while a2"^-™o < a2™ - 6. Then 

oo 

Ta := J]] 22'"(|a2™-™o|)^^ < ^ 22™(a2™'^'"o)^^ 

m>mo m=mo+l 



< 22(™o+^)(a2-'"o)^^22''2" 



fc=0 



= 2™o(2-7V) .^^.2. (i_2-(^-2))-i 

< (6/a)2-^ • • 4 

< 4 • (26/a)2 . 6"^ 

Hence 

Ti + < {b/af ■ (8 + 166"^) . 
Combining these displays gives the lemma, with explicit constants. □ 
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